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FOREWORD 


This  document  is  a  description  of  the  project  that  represents  the  first 
phase  of  the  Army's  long-term  research  effort  to  Improve  the  selection,  clas¬ 
sification,  and  utilization  of  Army  enlisted  personnel.  The  thrust  for  the 
project  came  from  the  practical,  professional,  and  legal  need  to  validate  the 
Armed  Services  Vocational  Aptitude  Battery  (ASVA6— the  current  U.S.  military 
selection/classification  test  battery)  and  other  selection  variables  as 
predictors  of  training  success  and  Job  performance. 

The  portion  of  the  effort  described  herein  was  devoted  to  the  develop¬ 
ment  and  validation  of  Army  Selection  and  Classification  measures,  referred  to 
as  “Project  A."  Project  A  was  conducted  under  contract  by  the  Selection  and 
Classification  Technical  Area  (SCTA)  of  the  Manpower  and  Personnel  Research 
Laboratory  (MPRL)  at  the  U.S.  Army  Research  Institute  for  the  Behavioral  and 
Social  Sciences  (ARI).  The  research  supports  the  MPRL  and  SCTA  mission  to 
improve  the  Army's  capability  to  select  and  classify  its  applicants  for  en¬ 
listment  or  reenlistment  by  ensuring  that  fair  and  valid  measures  are  devel¬ 
oped  for  evaluating  applicant  potential  based  on  expected  job  performance  and 
utility  to  the  Army. 

Project  A  was  authorized  through  a  letter,  Deputy  Chief  of  Staff  for 
Operations,  "Army  Research  Project  to  Validate  the  Predictive  Value  of  the 
Armed  Services  Vocational  Aptitude  Battery,"  effective  19  November  1980  and  a 
Memorandum,  Assistant  Secretary  of  Defense  (MRA&L),  "Enlistment  Standards," 
effective  11  September  1980. 

To  ensure  that  Project  A  research  achieved  its  full  scientific  potential 
and  would  be  useful  to  the  Army,  an  advisory  group  comprised  of  Army  general 
officers.  Interservice  scientists,  and  experts  in  personnel  measurement, 
selection,  and  classification  was  established.  Members  of  the  expert  compo¬ 
nent  provided  guidance  on  technical  aspects  of  the  research,  while  general 
officer  and  interservice  components  oversaw  the  entire  research  effort,  pro¬ 
vided  military  judgment,  provided  periodic  reviews  of  the  project's  progress, 
results,  and  plans,  and  coordinated  within  their  conman  s.  Members  of  the 
General  Officers'  Advisory  Group  varied  during  the  7-yijr  period  covered  by 
this  report.  Throughout  the  course  of  the  project,  this  group  was  briefed  on 
the  plans  and  results  of  the  various  research  phases  and  provided  continuing 
military  guidance.  Members  of  Project  A's  Scientific  Advisory  Group  guided 
the  technical  quality  of  the  research.  During  the  period  covered  by  this 
report  members  included  Ors.  Philip  Bobko,  Thomas  C^ok,  Milton  Hakel  (Chair), 
Lloyd  Humphreys,  Lawrence  Johnson,  Robert  Linn,  Mary  Tenopyr,  and  Jay  Uhlaner. 
This  group  was  briefed  throughout  the  project  on  the  technical  concepts, 
plans,  and  implementation  results  and  provided  advice  on  the  further  develop¬ 
ment  of  classification  and  assignment  principles  and  procedures. 
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This  final  report  on  Project  A  summarizes  the  development  and  evaluation 
work  done  during  the  three  main  phases  of  the  research:  (a)  analysis  of  file 
data  on  an  FY81/82  accession  sample  to  compare  their  ASVAB  scores  and  their 
subsequent  Army  performance;  (b)  selection  of  a  representative  sample  of 
entry-level  M0S»  and  development  and  testing  of  predictor  and  Job  performance 
measures  with  a  sample  of  FY83/84  accessions:  and  (c)  administration  of  the 
revised  predictor  tests  to  a  large  sample  of  Fy86/87  accessions  and  evaluation 
of  their  subsequent  first-tour  performance.  The  products  from  this  comprehen¬ 
sive  research  undertaking  have  application  both  in  present  Army  personnel 
operations  and  in  continuing  efforts  to  improve  the  selection  and  classifica¬ 
tion  system. 


EDGAR  M.  JOHNSON 
Technical  Director 


IMPROVING  THE  SELECTION.  CLASSIFICATION,  AND  UTILIZATION  OF  ARMY  ENLISTED 
PERSONNEL:  FINAL  REPORT  ON  PROJECT  A 


EXECUTIVE  SUMMARY 


Requirement: 

Project  A  was  a  comprehensive  U.S.  Army  program  to  develop  an  Improved 
system  to  select  and  classify  enlisted  personnel.  The  system  encompasses 
675,000  persons  and  several  hundred  Military  Occupational  Specialties  (MOS). 
The  objectives  were  to  (a)  validate  existing  selection  measures  against  both 
existing  and  project-developed  criteria  and  develop  new  measures,  ,b)  validate 
early  criteria  (e.g.,  performance  In  training)  as  predictors  of  later  criteria 
(e.g.,  Job  performance)  to  Improve  assignment  and  promotion  decisions,  and 
(c)  determine  the  relative  utility  to  the  Army  of  different  performance  levels 
across  MOS. 


Procedure: 

With  the  Deputy  Chief  of  Staff  for  Personnel  as  sponsor,  work  on  the 
long-term  project  was  begun  In  1982.  In  the  first  stage,  relationships 
between  the  scores  applicants  made  on  the  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB)  and  their  later  performance  In  training  and  first-tour  skill 
tests  were  explored  using  file  data  for  FY81/82  Army  accessions. 

The  second  stage  was  execuL:.d  with  FY33/84  accessions  In  19  MOS, 
selected  as  representative  of  the  Army's  250+  entry-level  MOS  and  accounting 
for  45  percent  of  Army  accessions.  A  preliminary  battery  of  predictor  mea¬ 
sures  (perceptual,  spatial,  temperament.  Interest,  and  biodata)  was  tested 
with  several  thousand  soldiers  as  they  entered  four  MOS;  revised  versions  were 
field  tested  with  nine  MOS.  The  resulting  predictor  battery  and  a  comprehen¬ 
sive  set  of  school  knowledge  tests.  Job  knowledge  tests,  hands-on  tests,  and 
performance  ratings  were  administered  In  1985  to  9,500  soldiers  In  19  MOS  In 
the  "Concurrent  Validation."  The  results  were  used  to  analyze  the  components 
of  first-tour  performance  on  the  Job  (General  Soldiering  Skills,  MOS-SpecIfIc 
Skills,  Leadership/Effort,  Personal  Discipline,  Military  Bearing/Physical 
Fitness),  and  to  compare  the  validities  of  the  current  ASVAB  composites  and 
the  added  predictor  measures  for  predicting  Job  performance. 

In  the  third  stage,  known  as  the  "Longitudinal  Validation,"  the  revised 
predictor  measures  were  used  to  test  more  than  49,000  recruits  at  the  time 
they  entered  21  MOS  In  FY86/87.  Soldiers  from  this  sample  were  tested  on 
their  performance  during  training  and  are  being  tested  during  their  first  tour 
on  the  Job.  Soldiers  from  the  FY83/84  sample  were  also  tested  on  their 
second-tour  performance. 
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Findings: 


Project  A  products  are  of  two  general  kinds:  products  for  the  *sc1ence'' 
(personnel  research)  and  products  for  the  organization  (the  Army).  However, 
many  products  are  useful  for  both  fields. 

(1)  Comprehensive  reviews  exist.  In  technical  report  form,  of  a11 
validity  evidence  pertaining  to  selection  and  classification  for 
skilled  Jobs.  These  are  the  most  comprehensive  reviews  of  this 
type  ever  done. 

(2)  Using  much  more  comprehensive  samples  than  ever  before,  new  ASVAB 
aptitude  area  composites  have  been  developed  that  are  firmly  data 
based  and  empirically  defensible.  The  analyses  Involving  ASVAB 
have  resulted  In  a  much  dearer  Idea  of  Its  factor  structure,  of 
what  the  factors  are  measuring,  and  of  its  strengths  and 
limitations. 

(3)  The  question  of  whether  ASVAB  does  or  does  not  predict  job  per¬ 
formance  (In  addition  to  training  performance)  has  been  answered 
definitively  in  the  affirmative.  The  Army  and  the  Department  of 
Defense  are  now  in  a  more  Informed  position  to  support  their 
quality  goals. 

(4)  A  set  of  new  experimental  tests  has  been  developed  to  measure 
noncognitive.  psychomotor,  perceptual,  and  cognitive  character¬ 
istics  that  are  not  now  measured  by  the  ASVAB.  The  scope  of 
Project  A  made  it  possible  to  examine  virtuelly  the  entire  domain 
of  selection  Information,  sample  from  It.  and  Investigate  the 
basic  Incremental  validity  produced  by  each  major  piece  of 
information. 

(5)  Within  the  limits  of  the  Concurrent  Validation  design,  the 
incremental  validity  of  appropriate  ABLE  temperament  scales  for 
predicting  the  "will  do"  components  of  performance  has  been 
demonstrated.  The  potential  of  the  AVOICE  Interest  scales  for 
differentially  predicting  "can  do"  performance  In  combat  vs. 
technical  vs.  administrative  support  MOS  has  been  established. 

(6)  Much  has  been  learned  about  the  nature  of  performance  in  entry- 
level  skilled  jobs  (e.g.,  first-tour  MOS).  We  now  have  a  much 
clearer  Idea  of  what  major  factors  constitute  performance  and  how 
they  can  be  measured. 

(7)  The  Project  A  job/task  analysis  procedures  worked  well  and  can  be 
used  by  the  Army  In  the  future  to  develop  training  curricula. 

Skill  Qualification  Test  content,  performance  measures,  and  field 
exercises. 
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(8)  Advanced  Individual  Training  (AIT)  achievement  measures  have  been 
developed  for  21  MOS.  The  training  measures  will  allow  a  determi¬ 
nation  of  whether  training  performance  predicts  job  performance, 
and  whether  it  does  so  differentially  for  different  groups  of 
trainees  (race,  gender),  and  different  groups  of  MOS  (combat, 
combat  support,  combat  service  support). 

(9)  The  package  of  rating  scale  administration  procedures  can  be  used 
in  future  personnel  research  In  the  Ani\y.  A  major  effort  In 
Project  A  was  to  develop  an  effective  and  efficient  set  of  pro¬ 
cedures  for  administering  performance  rating  scales  to  large 
numbers  of  people. 

(10)  The  data  indicate  that  supervisor  ratings  of  subordinate  per¬ 
formance  have  considerable  construct  validity  If  a  careful 
measurement  procedure  is  followed.  Supervisors  seem  to  assess 
both  the  technical  performance  of  individuals  and  their  general 
dependability/motivation  at  the  same  time. 

(11)  One  very  real,  and  very  Important,  product  is  the  Project  A  data 
base  Itself.  It  is  by  orders  of  magnitude  the  largest  and  most 
completely  documented  personnel  research  data  base  In  existence. 


Utilization  of  Findings: 

The  Project  A  tests  for  predicting  and  measuring  training  and  job 
performance  are  being  used  in  both  current  and  long-range  research  programs 
that  are  expected  to  make  the  Army  more  effective  in  matching  the  requirements 
for  first-  and  second-tour  enlisted  manpower  with  the  personnel  resources  that 
are  available  to  the  Army.  Additionally,  Project  A  findings  have  already  been 
used  to  make  substantial  Improvements  to  the  existing  selection  and  classifi¬ 
cation  system. 
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IMPROVING  THE  SELECTION.  CLASSIFICATION,  AND  UTILIZATION 
OF  ARMY  ENLISTED  PERSONNEL:  FINAL  REPORT  ON  PROJECT  A 

Chapter  1 
INTRODUCTION 


AN  OVERVIEW  OF  PROJECT  A 

The  Amy  annually  contacts  400.000  to  500,000  young  men  and  women, 
selects  90,000  to  130,000  of  them,  and  assigns  each  Individual  to  one  of 
some  275  occupational  specialties.  Project  At  Improving  the  Selection, 
Classification,  and  Utilization  of  Army  Enlisted  Personnel,  and  Project  B:  An 
Enlisted  Personnel  Allocation  System,  were  designed  to  provide  the  greatest 
possible  increase  In  overall  performance  and  readiness  that  can  be  obtained 
from  Improved  selection,  classification,  and  allocation  of  enlisted  personnel. 
These  two  research  programs  provided  an  Integrated  examination  of  performance 
measurement,  selectlon/classificatlon,  supply  and  demand  parameters,  and 
allocation  procedures  to  enable  the  Army  to  attempt  optimizing  the  achievement 
of  multiple  personnel  management  goals  (e.g..  Increase  performance  and 
decrease  attrition). 

The  broad  responsibilities  of  Project  A  were  to  develop: 

e  A  comprehensive  set  of  new  predictor  measures,  following  on  valida* 
tion  of  existing  measures. 

e  Multiple  measures  of  Job  performance,  against  which  selection/ 
classification  measures  can  be  evaluated. 

e  Accurate  estimates  of  the  predictability  of  future  performance. 

e  Decision  rules  for  selectlon/classificatlon  at  enlistment  and 
reenllstment  to  optimize  Individual  and  system  performance. 

e  A  way  of  evaluating  the  relative  utility  to  the  Army  of  different 
performance  levels  across  MOS. 


Qrtoins  of  Pro.lect  A 

The  Impetus  for  Project  A  came  from  the  practical,  professional,  and 
legal  need  to  demonstrate  the  validity  of  the  Armed  Services  Vocational 
Aptitude  Battery  (ASVAB)  and  other  selection  variables  for  predicting  Job 
performance.  Much  of  the  existing  validity  data  was  based  on  using  training 
measures  as  criteria. 

In  response  to  Army,  Congressional,  and  professional  requirements,  the 
Army  Research  Institute  (ARI)  began  In  1980  to  develop  a  major  new  research 
program  for  personnel  selection,  classification,  and  allocation.  The  basic 
requirement  was  to  demonstrate  the  validity  of  the  ASVAB  at  a  predictor  of 
both  training  and  on-the-Job  perfor.Mnce.  In  reviewing  the  design  needed  to 
meet  that  requirement,  the  concept  of  a  larger  project  began  to  emerge.  With 
only  a  moderate  amount  of  additional  resources,  new  selection/  classification 
measures  In  the  perceptual,  psychomotor.  Interest,  temperament,  and  biodata 
domains  could  be  evaluated  as  well.  In  addition,  a  longitudinal  research  data 
base  could  be  developed,  linking  soldiers'  performance  on  a  variety  of 
variables  from  enlistment,  through  training,  first>tour  assignments. 


1 


reenllstment  decisions,  end  for  some,  to  their  second  tour.  Finally,  the 
validation  data  could  be  the  basis  for  new  methods  of  allocating  personnel, 
and  making  near*real>time  decisions  on  the  best  match  between  characteristics 
of  an  Individual  enlistee  or  reenlistee  and  requirements  of  available  Army 
Military  Occupational  Specialties  (HOS). 

To  address  the  selection  and  classification  portion  of  the  effort, 
solicitation  HDA  M3-81-12-R-01S8  "Project  At  Development  and  Validation  of 
Army  Selection  and  Classification  Measures*  was  issued  21  October  1981.  This 
document  can  be  viewed  as  the  official  starting  point  of  Project  A.  The 
research  program  was  intended  to  bring  together  Army  and  contractor  research 
personnel  in  a  combined  effort  to  meet  the  Armv's  requirements  for  improving 
the  processes  and  programs  for  selecting  and  classifying  enlisted  personnel. 

In  the  solicitation,  the  Army  psychologists  mapped  out  a  comprehensive  7*vear 
research  program  to  provide  the  instrumentation  and  data  necessarv  to  imple¬ 
ment  a  state-of-the-art  selection  and  classification  svstem  for  all  enlisted 
perr-onnei.  (fo  ;irovide  background,  a  description  of  the  present  Army 
personnel  system  is  included  as  Appendix  A.) 

While  the  contract  solicitation  process  was  ongoing,  the  new  Manpower 
and  Personnel  Research  Laboratory  was  created  within  ARI,  and  Or.  Joyce  L. 
Shields  was  chosen  as  director.  To  accoinnodate  the  substantial  in-house 
portion  of  Project  A,  the  Selection  and  Classification  Technical  Area  was 
established,  with  Or.  Newell  K.  Eaton  as  chief. 

Eormation  of  the  Consortium 

In  anticipation  of  the  solicitation,  the  Human  Resources  Research 
Organization  (HumRRO),  American  Institutes  for  Research  (AIR),  and  Personnel 
Decisions  Research  Institute  (PORI)  formed  a  consortium  to  develop  a  research 
proposal  to  meet  the  requirements  of  the  forthcoming  "Development  and  Valida¬ 
tion  of  Army  Selection  and  Classification  Measures"  Request  for  Proposal 
(RFP).  It  was  agreed  that  HumRRO,  as  prime  contractor,  would  assume  respon¬ 
sibilities  for  overall  contract  management,  technical  direction,  planning,  and 
reporting.  The  proposal  was  submitted  in  January  1982  and  the  contract  was 
awarded  to  the  HumRRO-AIR-PORI  consortium  30  September  1982. 

Project  Outline 

The  overall  purpose  of  Project  A  was  to  enhance  the  Army's  ability  to 
accomplish  its  peacetime  and  mobilization  missions  through  improved  matching 
of  individuals  to  Military  Occupational  Specialties.  Specifically,  Project  A 
was  to 

(1)  Validate  existing  selection  measures  against  both  existing  and 
project- developed  criteria,  the  latter  to  Include  both  Army-wide 
performance  measures  based  on  newly  developed  rating  scales  and 
direct  measures  of  MOS-specific  task  performance. 

(2)  Develop  and  validate  nwv  and/or  improved  selection  and 
classification  measures. 
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(3)  Validate  proximal  criteria,  such  as  performance  In  training,  as 
predictors  of  later  criteria,  such  as  Job  performance  ratings,  so 
that  more  informed  reassignment  and  promotion  decisions  can 

be  made  throughout  the  Iridlvidual's  tour. 

(4)  Determine  the  relative  utility  to  the  Army  of  different 
performance  levels  across  MOS. 

(5)  Estimate  the  relative  effectiveness  of  alternative  selection  and 
classification  procedures  In  terms  of  their  validity  and  utility 
for  making  operational  selection  and  classification  decisions. 

The  Statement  of  Work  required  that  Project  A  be  designed  as  one 
Integrated  project  organized  Into  five  major  tasks: 


Task  1.  Validation.  Task  Ihad  two  major  components.  The  first  was  to 
develop  and  maintain  the  data  base  and  provide  the  analytic  procedures  to 
determine  the  degree  to  which  performance  In  Army  Jobs  Is  predictable  from 
some  combination  of  new  or  existing  measures.  The  second  component  was  to 
conduct  the  appropriate  analyses  to  determine  whether  the  existing  set  of 

fredlctors,  new  predictors,  or  some  combination  of  new  and  existing  predictors 
as  utility  over  and  above  the  present  system. 


the  ef 
Improv 
cognit 


orts  of 


Develop  predictors  of  Job  Performance.  A  large  proportion  of 
the  Armed  Services  In  this  regard  have  been  concentrated  on 
ng  the  ASVAB,  which  Is  now  a  we 11 -researched,  valid  measure  of  general 
ve  abilities.  However,  many  critical  Army  tasks  appear  to  require 
psychomotor  and  perceptual  skills  for  their  successful  performance.  Further, 
neither  biodata  nor  motivational  variables  were  comprehensively  evaluated. 

The  objectives  of  Task  2  were  to  develop  a  broad  array  of  new  and  Improved 
selection  measures  and  to  administer  them  to  three  major  validation  samples. 
A  critical  aspect  of  this  task  was  to  be  the  demonstration  of  the  Incremental 
validity  added  by  new  predictors. 


Task  3.  Heasurament  ef  School /Training  Succms.  The  objective  of  Task 
3  was  to  derive  school  and  training  performance  Indexes  that  could  be  used 
(a)  as  criteria  against  which  to  validate  the  Initial  predictors,  and  (b)  as 
predictors  of  later  Job  performance. 


Task, 4..  Assessment  of  Armv-wlde  Performance.  In  contrast  to  perfor¬ 
mance  measures  that  may  be  developed  for  a  specific  Army  MOS,  Task  4  was  to 
develop  measures  that  could  be  used  across  all  MOS  (i.e..  Army-wide).  The 
Intent  was  to  develop  measures  of  first-  and  second-tour  Job  performance 
against  which  all  Army  enlisted  personnel  could  be  measured.  A  major 
objective  was  to  develop  a  model  of  soldier  effectiveness  that  specifies  the 
major  dimensions  of  an  Individual's  contribution  to  the  Army  as  an  organiza¬ 
tion.  Another  Important  objective  of  Task  4  was  to  develop  a  procedure  that 
could  be  used  to  scale  the  utility  of  levels  of  performance. 

Task  S.  Develoo  MOS-SoecIflc  Performance  Measures.  Task  5  was  focused 
on  developing  reliable  and  valid  measures  of  specific  Job  task  performance  for 
a  selected  set  of  MOS.  This  task  had  three  major  components:  Job  analysis, 
construction  of  Job  performance  measures,  and  construct  validation  of  the  new 
measures.  While  only  a  subset  of  MOS  were  analyzed  during  this  project,  the 
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Army  may  In  the  future  wish  to  develop  job  performance  measures  for  a  larger 
number  of  MOS.  For  this  reason,  the  methodology  was  to  apply  to  all  Army  MOS. 

Initial  Project  Organization 

The  Initial  project  organization  is  shown  in  Figure  1*1 .  The  principal 
consortium  task  scientists  are  shown,  with  their  respective  organizations,  in 
the  lower  row.  The  principal  ARI  scientists  are  shown  in  the  upper  row. 
Consortium  and  ARI  scientists  carried  out  research  activities  both  inde> 
pendently  and  jointly.  ARI  scientists  also  had  the  administrative  role  of 
contract  oversight. 

We  include  this  diagram  only  to  show  the  matching  of  contractor  and  ARI 
staff  and  to  illustrate  the  form  of  the  project  management  and  contract  review 
structure.  There  were  of  course  a  number  of  personnel  changes  over  the  life 
of  the  project. 

A  project  of  this  scale  would  have  to  hmintain  close  and  active  coor¬ 
dination  with  the  other  military  departments  and  the  Department  of  Defense,  as 
well  as  remain  consistent  with  other  ongoing  research  programs  being  conducted 
by  the  other  Armed  Services.  The  project  also  needed  a  mechanism  for  assuring 
that  the  research  program  met  the  highest  standards  for  scientific  quality. 
Finally,  a  method  was  needed  to  receive  feedback  from  senior  officers  on 
priorities  and  objectives,  as  well  as  to  identify  current  problems.  An 
effective  mechanism  for  meeting  these  needs  was  deemed  to  be  a  structure  of 
advisory  groups. 

Figure  1-2  shows  the  structure  and  membership  of  the  Governance  Advisory 
Group  (GAG),  which  is  made  up  of  the  Scientific  Advisory  Group  (SAG),  Inter¬ 
service  Advisory  Group  (I SAG),  and  Army  Advisory  Group  (AA6)  components. 

The  SAG  was  comprised  of  nationally  recognized  authorities  in  psycho¬ 
metrics,  experimental  design,  sampling  theory,  utility  analysis,  applied 
research  in  selection  and  classification,  and  the  conduct  of  psychological 
research  in  the  Army  environment.  It  is  perhaps  indicative  of  the  substance 
and  success  of  Project  A  that  all  members  of  the  Scientific  Advisory  Group 
remained  with  the  project  from  its  beginning  to  the  end. 

The  ISAG  was  comprised  of  the  Laboratory  Directors  for  applied  psycholo¬ 
gical  research  in  the  Army,  Air  Force,  and  Navy,  and  the  Director  of  Accession 
Policy  from  the  Office  of  Assistant  Secretary  of  Defense  for  Manpower  and 
Reserve  Affairs.  The  AA6  included  representatives  from  the  Office  of  Deputy 
Chief  of  Staff  for  Personnel  (OCSPER),  Office  of  Deputy  Chief  of  Staff  for 
Operations  (DCSOPS),  Training  and  Doctrine  Command  (TRADOC),  Forces  Command 
(FORSCOM),  and  U.S.  Army  Europe  (USAREUR). 
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Developinent  of  the_Research  Plan  and  the  Integrated  Master  Plan 

The  first  6  months  of  the  project  were  spent  In  planning,  documenting, 
reviewing,  modifying,  and  redrafting  of  research  plans,  troop  support, 
administrative  support,  and  budgetary  plans,  as  well  as  in  execution  of 
initial  research  efforts.  Drafts  of  the  plans  were  provided  to  the  SAG  and 
ISAG.  The  culminating  review  was  conducted  in  April  1983  by  the  Army  Advisory 
Group,  with  representatives  from  the  Scientific  and  Interservice  Advisory 
Groups.  The  research  program  was  endorsed  by  all  three  components  of  the  GAG, 
and  in  May  1983,  ARI  issued  Research  Report  1332,  Improvino  the  Selection. 
Classification,  and  utilization  of  Army  Enlisted  Personnel;  Project  A  Research 
Plan. 


An  Outline  of  the  Project  A  Research  Plan 

The  Project  A  Research  Plan  spoke  to  the  specific  operational  and 
scientific  outcomes  that  would  flow  from  the  project. 

Operational  Objectives 

The  operational  objectives  were  to  -> 

(1)  Develop  new  measures  of  Job  performance  that  can  be  used  as 
criteria  against  which  to  validate  selection/classification 

measures. 

(2)  Validate  existing  selection  measures  against  both  existing  and 
project-developed  criteria. 

(3)  Develop  and  validate  new  selection  and  classification  measures. 

(4)  Develop  a  utility  scale  for  different  performance  levels  across 
MOS. 

Research  Objectives 

The  research  objectives  were  to  — 

(1)  Identify  the  constructs  that  constitute  the  universe  of 
information  available  for  selection/classification  into  entry- 
level  skilled  Jobs. 

(2)  Develop  a  general  model  of  performance  for  entry-level  skilled 
Jobs. 

(3)  Investigate  the  construct  validity  of  the  "method"  variance  in  Job 
performance  measures. 

(4)  Estimate  the  value  of  different  levels  of  Job  performance. 

(5)  Estimate  the  degree  of  differential  prediction  across  (a)  major 
domains  of  predictor  information  (e.g.,  abilities,  temperament, 
interests),  (b)  major  factors  of  job  performance,  and 

(c)  different  types  of  Jobs. 
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Determine  the  extent  of  differential  prediction  across  racial  and 
gender  groups  for  a  systematic  sample  of  individual  differences, 
performance  factors,  and  jobs. 


The  overall  design  of  Project  A  used  two  predictive  and  one  concurrent 
validation  on  two  major  troop  cohorts  (1983/1964  accessions  and  1986/1987 
accessions),  and  one  file  data  validation  on  the  1981/1962  cohort.  That  Is, 

In  addition  to  collecting  data  from  new  samples,  the  project  made  use  of 
existing  file  data  for  1981  and  1982  accessions.  Data  from  the  accessions  and 
Enlisted  Master  Files  (EMF)  were  edited  and  merged  into  the  Longitudinal 
Research  Data  Base  (LROB).  A  schematic  of  the  data  collection  plan  is  shown 
In  Figure  1-3. 

The  logic  of  the  design  was  straightforward.  Existing  file  data  on  the 
81/82  cohort  would  provide  an  early  opportunity  to  modify  the  existing 
operational  selection  and  classification  decision  rules;  and  In  fact,  the  file 
data  analyses  were  used  to  recommend  changes  In  the  composition  of  the  ASVAB 
Aptitude  Area  composites.  The  83/84  cohort  provided  the  first  opportunity  to 
obtain  data  using  new  predictor  and  performance  measures.  A  "preliminary" 
battery  of  predominantly  off-the-shelf  tests  provided  new  predictor  data  on 
soldiers  in  four  MOS  (05C,  19E/K,  638,  71L).  These  data  together  with  an 
exhaustive  literature  search,  job  analysis  Information,  and  multiple  expert 
panel  reviews  provided  the  Information  to  construct  a  more  tailored  trial 
battery  which  was  administered  concurrently  with  a  variety  of  training,  Army- 
wide,  and  MOS-specific  performance  measures  in  1985  to  the  1983/84  cohort. 

The  refinement  of  these  measures  resulted  in  the  Experimental  Predictor 
Battery  which  was  administered  to  a  longitudinal  sample  from  the  FYB6/87 
cohort.  The  job  performance  criterion  measures  were  administered  to  this 
cohort  during  late  1988.  In  addition,  at  this  same  time  second-tour  per¬ 
formance  measures  were  developed  for  and  administered  to  the  FY83/64  cohort  as 
part  of  a  longitudinal  followup  of  that  sample  into  its  second  tour. 


The  overalT  objective  In  generating  the  samples  was  to  maximize  the 
validity  and  reliability  of  the  information  to  be  gathered,  while  at  the  same 
time  minimizing  the  time  and  costs  Involved.  While  costs  are  a  function  of 
the  numbers  of  people  in  the  sample,  they  are  also  influenced  by  the  relative 
difficulty  Involved  In  locating  and  assembling  the  people  in  a  particular 
sample. 


The  sampling  plan  Itself  incorporated  two  principal  considerations. 
First,  a  sample  of  MOS  was  selected  from  the  universe  of  possible  MOS.  Then, 
the  required  sample  sizes  of  enlisted  personnel  within  each  MOS  were  speci¬ 
fied.  Because  Project  A  was  developing  a  system  for  a  population  of  jobs 
(MOS),  the  MOS  are  the  primary  sampling  units. 


There  Is  a  trade-off  in  the  allocation  of  resources  between  the  number 
of  MOS  researched  and  the  number  of  subjects  tested  within  each  MOS:  the  more 
MOS  are  investigated,  the  fewer  subjects  per  MOS  can  be  tested,  and  vice 
versa.  Cost  and  statistical  reliability  considerations  dictated  that  19  MOS 
could  be  studied.  The  new  predictors  (from  Task  2)  as  well  as  the  school  and 
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Flgnral-3.  Th»  owl  wefch  detign  tor  Protect  A. 


Amy-wide  performance  measures  (of  Tasks  3  and  4)  were  administered  to  all  19. 
For  nine  of  the  19  MOS,  the  HOS-specIfic  performance  measures  developed  in 
Task  5  were  also  administered;  the  nine  MOS  were  chosen  to  provide  maximum 
coveragei  given  certain  statistical  constraints •  of  the  total  array  of 
knowledge,  ability,  and  skill  requirements  of  Army  Jobs. 

The  selection  of  the  sample  if  19  MOS  proceeded  through  a  series  of 
stages.  An  Initial  sample  of  MOS  was  drawn  on  the  basis  of  the  following 
considerations: 

(1)  High-density  MOS  that  would  provide  sufficient  sample  sizes 

for  statistically  reliable  estimates  of  new  predictor  validity  and 
differential  validity  across  racial  and  gender  groups. 

(2)  Representative  coverage  of  the  aptitude  areas  measured  by  the 
ASVAB  area  composites. 

(3)  High-priority  MOS  (as  rated  by  the  Army  In  the  event  of  a  national 
emergency). 

(4)  Representation  of  the  Army's  designated  Career  Management  Fields 
(CMF). 

(5)  Representation  of  the  jobs  most  crucial  to  the  Army's  mission. 

A  further  Indirect  Indication  of  the  mix  of  job  skills  represented  In 
the  sample  Is  In  the  range  of  ASVAB  composites  and  component  subtests 

Sertinent  to  each  MOS.  The  ASVAB  subtests  are  Word  Knowledge  (WK),  Paragraph 
omprehenslon  (PC),  Arithmetic  Reasoning  (AR),  Numerical  Operations  (NO), 
General  Science  (GS),  Mechanical  Comprehension  (MC),  Math  Knowledge  (MK), 
Electronics  Information  (El),  Coding  Speed  (CS),  and  Auto-Shop  Information 
(AS).  The  WK  and  PC  subtest  raw  scores  are  summed  to  create  an  additional 
Verbal  (VE)  subtest.  The  composites,  combinations  of  subtests  to  characterize 
aptitude  areas,  are  Clerical  (CL),  Combat  (CO),  Electronics  (EL),  Field 
Artillery  (FA),  General  Maintenance  (GH),  Mechanical  Maintenance  (MM), 
Operators/Food  (OF),  Surveillance  and  Communication  (SC),  and  Skilled 
Technical  (ST). 

All  subtests  and  all  but  one  (Electronics)  of  the  nine  composites  were 
represented  In  the  18  MOS  Initially  selected.  Consequently,  a  19th  MOS  (27E) 
was  chosen  to  represent  the  EL  aptitude  composite.  The  composition  of  the 
sample  was  alto  examined  from  the  perspective  of  mission  criticality  by 
comparing  It  with  a  list  of  42  MOS  Identified  by  the  Army  as  high  priority  for 
inob111zat1;>n  training.^  This  Initial  set  of  19  MOS  represent  19  of  the  Army's 
30  CMF.  Of  the  11  CMF  not  represented,  two  are  classified  (CMF  96  and  98), 
two  (CMF  33  and  74)  had  fewer  than  500  FY81  accessions,  and  seven  (CMF  23,  28, 
29,  79,  81,  84,  and  74)  had  fewer  than  300  FY81  accessions.  The  Initial  MOS 
set  Included  only  5  percent  of  Army  jobs  but  44  percent  of  the  soldiers 
recruited  In  FY81.  Similarly,  of  the  15  percent  women  In  the  Army,  44  percent 
are  represented  In  the  sample. 


'ODCSOPS  (DAMO-ODM),  OF,  2  Jul  82,  Subject;  IRR  Training  Priorities. 
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Guidance  from  the  Scientific  Advisory  Group  led  to  further  refinement  of 
the  MOS  sample.  A  cluster  analysis  of  expert  ratings  of  MOS  similarity  was 
made,  and  the  initial  sample  was  reviewed  by  the  Governance  Advisory  Group. 

To  obtain  data  for  empirically  clustering  HOS  on  the  basis  of  their  task 
content  similarity,  a  brief  Job  description  was  generated  for  each  of  111  MOS 
from  the  Job  activities  described  in  AR  611>20l/  The  sample  of  111  MOS 
included  the  64  largest  NOS  (300  or  more  new  Job  incumbents  yearly)  plus  an 
additional  27  selected  randomly  but  proportionately  by  CMF.  Each  Job  descrip* 
tion  was  limited  to  two  sides  of  a  Sx7  index  card. 

Members  of  the  contractor  research  staff  and  ARI  Army  officers  (N  ■  25), 
serving  as  expert  Judges,  sorted  the  sample  of  111  Job  descriptions  into 
homogeneous  categories  based  on  perceived  similarities  and  differences  in  the 
described  Job  activities.  The  similarity  data  were  clustered  and  used  to 
check  the  representativeness  of  the  Initial  sample  of  19  HOS.  (That  is,  did 
the  19  MOS  include  representatives  from  all  the  major  clusters  of  MOS  derived 
from  the  similarity  scaling?)  On  the  basis  of  these  results  and  guidance 
received  from  the  Governance  Advisory  Group,  two  MOS  that  had  been  selected 
initially  were  replaced. 

The  initial  sample  of  19  MOS  resulting  from  the  above  procedures  is 
shown  in  Table  1-1.  The  subsample  of  nine  MOS  to  which  the  MOS-specific 


Table  1-1 

Initial  List  of  Project  A  Military  Occupational  Specialties  (NOS) 


QATCH  Z 

05C 

Radio  Teletype  Operator* 

12B 

Combat  Engineer 

IIB 

Infantryman 

16S 

MANPADS  Crewman 

13B 

Cannon  Crewman 

27E 

TOW/Dragon  Repairer 

19E 

Tank  Crewman 

51B 

Carpentry/Masonry  Specialist 

63B 

Vehicle  &  Generator  Mechanic 

54E 

Chemical  Operations 

Specialist 

Specialist 

64C 

Motor  Transport  Operator 

55B 

Ammunition  Specialist 

71L 

Administrative  Specialist 

67N 

Utility  Helicopter  Repairer 

91A 

Medical  Care  Specialist 

76H 

Petroleum  Supply  Specialist 

9SB 

Military  Police 

76Y 

Unit  Supply  Specialist 

94B 

Food  Service  Specialist 

*  MOS-specific  criterion  measures  were  administered  in  these  MOS. 
^  MOS  05C  later  became  MOS  31C. 


'Army  Regulation  611-201,  Enlisted  Career  Management  Fields  and 
Military  Occupational  Specialties. 
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criterion  measures  were  administered  Is  shown  as  Batch  A.  During  the  course  of  the 
project,  some  MOS  changed  names  or  numbers,  some  were  added  or  deleted  because 
requirements  changed.  The  MOS  lists  in  the  report  reflect  these  changes  as  they 
occurred.  One  of  the  original  HOS  (76W)  was  deleted  and  three  MOS  (19K,  20E,  and 
96B)  were  added,  making  a  total  of  21  MOS  in  the  sample  during  the  later  stages  of 
the  research. 


ORGANIZATION  OF  THIS  PROJECT  A  REPORT 

Given  the  basic  design  Just  described,  the  remainder  of  this  report 
summarizes  the  substantive  work  of  Project  A  from  October  1982  through  March 
1990.  Since  Project  A  was  large  in  scope,  the  summary  is  not  short.  The 
intent  was  to  provide  enough  detail  to  permit  a  Judgment  about  the  thorough¬ 
ness  and  appropriateness  of  the  work  done  at  each  step. 

The  content  of  the  summary  was  assembled  from  the  FY83,  FY84,  FY85, 
FY86,  FY87,  and  FY88  project  annual  reports,  which  in  turn  were  based  on  very 
detailed  technical  reports,  working  papers,  and  convention  papers  on  special¬ 
ized  topics.  The  full  Bibliography  of  reports,  papers,  and  products  for  the 
duration  of  Project  A  is  included  as  Appendix  6.  The  names  of  the  people  who 
worked  on  Project  A  are  presented  in  Appendix  C. 


The  major  topics  covered  in  this  final  report  are; 
e  Development  of  new  selection/classification  (predictor)  tests, 
e  Development  of  new  measures  of  training  and  Job  performance, 
e  Concurrent  Validation  procedure, 
e  Development  of  basic  prediction  and  criterion  scores, 
e  Results  of  the  Concurrent  Validation. 

e  Development  of  differential  weights  for  the  major  components  of 
Job  performance. 

e  The  scaling  of  the  utility  of  performance  in  entry-level  Jobs. 

e  Job  analyses  and  criterion  development  for  second-tour  MOS. 

e  Samples  and  procedures  for  the  Longitudinal  Validation. 

The  final  chapter  of  the  report  discusses  the  Project  A  research  in  the 
context  of  selection  and  classification  history,  and  highlights  its  products 
and  findings  in  terms  of  both  basic  and  applied  researcn  concerns  and  goals. 


Chapter  2 

PREDICTOR  DEVELOPMENT 


SELECTION  OF  VARIABLES 

The  overall  goal  of  predictor  development  In  Project  A  was  to  construct 
an  experimental  test  battery  that  would,  when  combined  with  ASVAB,  yield  the 
maximum  Increment  In  selection/classificatlon  validity  for  the  entire  system. 
That  Is,  what  new  tests  should  be  used  In  conjunction  with  ASVAB  to  Increase 
the  aggreoate  accuracy  of  selection  and  classification  decisions  over  all  MOS 
In  the  enlisted  personnel  system?  Approximately  280  MOS  now  use  ASVAB  for 
such  decisions. 

Given  this  overall  goal,  the. Project  A  research  staff  adopted  a  very 
comprehensive  approach  that  tried  to  (a)  define  the  population  of  poten¬ 
tially  useful  variables;  (b)  describe  Its  latent  structure;  (c)  sample 
constructs  from  this  population  that  had  the  highest  probability  of  meeting 
the  goals  of  the  project;  (d)  construct  operational  measures  of  these  vari¬ 
ables:  (e)  pilot  test,  field  test,  and  revise  the  new  measures;  (f)  analyze 
their  empirical  covariance  structure;  and  (g)  determine  their  predictive 
validities,  and  specify  the  optimal  decision  rules  for  using  the  new  tests  to 
maximize  predicted  performance  and/or  minimize  attrition,  the  major  steps 
that  were  taken  to  execute  this  approach  are  described  In  this  chapter  (also 
see  Peterson,  1986;  Peterson  et  ai.,  1987). 

Review  of  Selection/ClassIflcatlon  Literature 

The  overriding  purpose  of  the  literature  review  was  to  gain  maximum 
benefit  from  earlier  research  that  was  even  remotely  relevant  for  the  jobs  In 
the  Project  A  job  population.  The  search  was  conducted  In  late  1982  and  early 
1983  (I.e.,  FY83)  by  three  teams  of  project  staff. 

Several  computerized  searches  of  all  relevant  data  bases  resulted  In 
Identification  of  more  than  10,000  sources.  In  addition,  reference  lists  were 
solicited  from  recognized  experts,  annotated  bibliographies  were  obtained  from 
military  research  laboratories,  and  the  last  several  years'  editions  of 
relevant  research  journals  were  examined,  as  were  more  general  sources  such  as 
textbooks,  handbooks,  and  appropriate  chapters  In  the  Annual  Review  of 

EayGholMY- 

The  references  Identified  as  relevant  were  obtained,  reviewed,  and 
summarized  using  a  standardized  report  protocol  of  seven  sections:  descrip¬ 
tion  of  predictor,  reliability,  norms/descriptive  statistics,  correlations 
with  other  predictors,  correlations  with  criteria,  adverse  Impact/differential 
valldlty/test  fairness,  and  reviewer's  recommendations  (about  the  usefulness 
of  the  predictor).  Each  predictor  was  tentatively  classified  into  an  Initial, 
working  taxonomy  of  predictor  constructs  (based  primarily  on  the  taxonomy 
described  In  Peterson  and  Bownas,  1982). 
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Literature  Sr  ;rv»i  Results 

The  literature  search  was  used  in  *wo  major  ways.  First,  three  working 
documents  were  written,  one  for  each  of  three  areas:  cognitive/perceptual 
abilities,  psychomotor/percoptual  abilities,  and  non-cognitive  predictors 
(including  temperament  or  p«rsona1ity,  vocatio.  a1  interest,  and  biographical 
data  variables).  These  documerts  summarized  the  literature  with  regard  to 
critical  issues,  suggested  the  most  appropriate  organization  or  taxonomy  of 
the  constructs  in  each  area,  and  summarized  the  validities  of  the  various 
measures  for  different  types  of  Job  performance  criteria.  (These  documents 
were  subsequently  issued  as  Hough,  1966;  McHenry  &  Rose,  1986;  Tcquam,  Corpe, 

&  Ounnette,  1990.) 

Second,  the  predictors  identified  in  the  review  were  subjected  to 
further  scrutiny  to  (a)  select  tests  and  Inventories  to  make  up  the  Preli¬ 
minary  Battery,  and  (b)  select  the  *best  bet*  predictor  constructs  to  be  used 
in  the  *expert  Judgment*  research  activity. 

An  initial  list  was  compiled  of  all  predictor  measures  that  seemed  even 
remotely  appropriate  for  Army  selection  and  classification.  This  list  was 
then  screened  by  eliminating  measures  according  to  several  "knockout*  factors: 
(a)  measures  developed  for  a  single  research  project;  (b)  measures  designed 
for  a  narrowly  specified  population/occupational  group  (e.g..  pharmacy 
students);  (c)  measures  targeted  toward  younger  age  groups;  (d)  measures 
requiring  unusually  long  testing  times;  (e)  measures  requiring  difficult  or 
subjective  scoring;  and  (f)  measures  requiring  individual  administration. 

Application  of  the  knockout  factors  resulted  in  a  second  list  of 
candidate  measures  that  served  as  the  final  selection  of  constructs  to  be 
included  in  the  "expert  Judgment."  This  research  was  designed  to  use  expert 
Judgment  to  estimate  the  potential  validity  of  each  relevant  construct,  if  it 
were  reliably  measured.  Schmidt,  Hunter,  Croll,  and  McKenzie  (1983)  have 
shown  that  pooled  expert  Judgments,  obtained  from  experienced  personnel 
psychologists,  have  considerable  accuracy  for  estimating  the  validity  of  tests 
in  actual,  empirical,  criterion-related  validity  research. 

Expert  FQrecasts._of  Predictor  Construct  Validities 

Peterson  and  Bownas  (1982)  provide  a  complete  description  of  the 
methodology  which  has  been  used  successfully  by  Bownas  and  Heckman  (1976), 
Peterson,  Houston,  Bosshardt,  and  Ounnette  (1977),  Peterson  and  Houston 
(1980),  and  Peterson,  Houston,  and  Rosse  (1984)  to  identify  predictors  for  the 
Jobs  of  firefighter,  correctional  officer,  and  entry-level  occupations 
(clerical  and  technical),  respectively.  Descriptive  information  about  a  set 
of  predictors  and  the  Job  performance  criterion  variables  is  given  to 
"experts*  in,  personnel  selection  and  classification.  These  experts  estimate 
the  relationships  between  predictor  and  criterion  variables  by  rating  or 
directly  estimating  the  value  of  the  correlation  coefficients. 

The  result  is  a  matrix  with  predictor  and  criterion  variables  as  the 
columns  and,  rows,  respectively.  Cell  entries  are  experts'  estimates  of  the 
degree  of  relationship  betv^en  the  particular  predktors  and  various  criteria. 
The  interniter  reliability  of  the  experts'  estimates  is  checked  first.  If  the 


estimate  Is  sufficiently  reliable  (previous  research  shovs  values  In  the  .80 
to  .90  range  for  about  10  to  12  experts),  the  matrix  of  predictor-criterion 
relationships  can  be  analyzed  and  used  In  a  variety  of  ways.  For  example,  by 
correlating  the  rows  of  the  matrix  the  covariances  between  criter la  can  be 
estimated,  and  by  correlating  the  columns  the  covariances  between  predictors 
can  be  estimated  on  the  basis  of  the  profiles  of  their  estimated  relationships 
with  the  criteria.  The  covariances  can  then  be  factor  analyzed  to  Identify 
clusters  of  predictors  within  which  the  measures  are  expected  to  exhibit 
similar  patterns  of  correlations  with  different  performance  components. 
Similarly,  the  criterion  covariances  can  be  examined  to  Identify  clusters  of 
criteria  predicted  by  a  common  set  of  predictors. 

Such  procedures  helped  In  Identifying  redundancies  and  overlap  In  the 
predictor  set.  The  clusters  of  predictors  and  of  criteria  are  an  Important 
product  for  a  number  of  reasons.  First,  they  provide  an  efficient  and 
organized  means  of  summarizing  the  data  generated  by  the  experts.  Second,  the 
summary  form  permits  easier  comparison  wuh  the  results  of  meta-analyses  of 
empirical  estimates  of  criterion-related  validity  coefficients.  Third,  these 
clusters  provide  a  model  or  theory  of  the  predictor-criterion  performance 
space. 

Method 


For  Project  A,  the  experts  were  35  Industrial,  measurement,  or  differ¬ 
ential  psychologists  with  experience  and  knowledge  In  personnel  selection 
research  and/or  applications. 

The  previous  reviews  of  the  population  of  constructs  had  Identified  a 
basic  list  of  53  variables,  and  materials  describing  each  of  these  variables 
were  prepared.  The  procedure  used  to  Identify  criterion  variables  was  based 
on  the  job  descriptions  of  the  sample  of  111  MOS  that  had  been  previously 
clustered  by  Job  experts  as  part  of  the  MOS  sample  selection.  Criterion 
categories  were  developed  by  reviewing  the  descriptions  to  determine  common 
Job  performance  activities. 

After  common  elements  In  the  23l  clusters  were  Identified,  additional 
categories  were  identified  to  cover  unique  aspects  of  Jobs  in  the  sample  of 
111.  Most  of  the  53  performance  component  categories  applied  to  several 
Jobs,  and  most  of  the  Jobs  were  characterized  by  activities  from  several 
categories.  The  second  type  of  criterion  variable  was  a  set  that  described 
performance  In  Initial  Army  training  as  defined  In  archival  records  and 
Interviews  with  trainers.  The  final  set  of  criterion  variables  consisted  of 
the  general  performance  categories  defined  by  the  behavioral  dimensions 
developed  as  part  of  Task  4.  In  all,  72  possible  criterion  constructs  were 
defined  for  use  In  the  expert  Judgment  task. 

Each  Judge  estimated  the  true  validity  of  each  predictor  for  each 
criterion  (i.e.,  criterion-related  validity  corrected  for  such  artifacts  as 
range  restriction  and  reliability,  and  unaffected  by  variation  In  sample 
sizes).  All  Judges  completed  the  task  during  the  first  week  of  October  1983, 

When  averaged  across  raters,  the  reliability  of  the  mean  estimated  cell 
validities  was  .96.  Factor  analyses  were  based  on  these  cell  moans.  The  most 
pertinent  for  purposes  of  this  summary  report  concerns  the  analysis  of  the 
predictor  profiles. 


IS 


Eight  interpretable  factors  were  named:  I,  Cognitive  Abilities;  II, 
Vlsuallzatlon/Spatlal;  III,  Information  Processing;  IV,  Mechanical;  V, 
Psychomotor;  VI,  Social  Skills;  VII,  Viaor;  VIII,  Motivation/Stablllty.  These 
eight  factors  appeared  to  be  composed  of  21  clusters,  and  the  hierarchical 
structure  is  shown  In  Figure  2-1. 

Variables  for  neasurement  were  sampled  from  the  hierarchy  on  the  basis 
of  (a)  a  careful  review  of  the  empirical  literature  within  each  category, 

(b)  visits  to  all  major  military  personnel  research  stations,  (c)  on-site 
observations  of  Individuals  during  field  exercises  In  the  combat,  specialties, 
and  (d)  a  multistage  review  of  all  available  Information  by  the  project  staff 
and  the  Scientific  Advisory  Group. 


Identification  of  Pilot  Trial  Batter^v  Measures 

In  March  1984,  a  formal  In  Progress  Review  (IPR)  meeting  was  held  to 
decide  on  the  measures  to  be  developed  for  the  Pilot  Trial  Battery.  Informa 
tion  from  the  literature  review,  expert  Judgments,  Initial  analyses  of  the 
Preliminary  Battery,  and  the  first  three  phases  of  computer  battery  develop¬ 
ment  was  presented  and  discussed.  The  Project  A  staff  made  recommendations 
for  Inclusions  of  measures  and  these  were  evaluated  and  revised.  Figure  2-2 
shows  the  results  of  that  deliberation  process. 

This  set  of  recommendations  constitutes  the  Initial  array  of  predictor 
variables  for  which  measures  would  be  constructed  and  then  submitted  to  a 
series  of  pilot  tests  and  field  tests,  with  revisions  being  made  after  each 
phase. 


PREDICTOR  DEVELOPMENT:  COGNITIVE  PAPER-AND-PENCIL  MEASURES 

Development  of  measurement  operations  for  the  high-priority  constructs 
considered  the  following  Issues:  (a)  a  definition  of  the  target  cognitive 
ability;  (b)  the  target  population  or  target  MOS  fer  which  the  measure  Is 
hypothesized  to  most  effectively  predict  success;  (c)  published  tests  that 
served  as  markers  for  each  new  measure;  (d)  Intended  level  of  Item  difficulty; 
and  (e)  type  of  test  (I.e.,  speed,  power,  or  a  combination). 

Brief  descriptions  of  the  Individual  tests,  as  Initially  designed,  are 
given  below,  along  with  an  explanation  of  the  constructs  the  tests  are 
Intended  to  represent. 
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spatial  Visualization  -  Rotation 

Spatial  visualization  involves  the  ability  to  mentally  manipulate 
components  of  two-  or  three-dimensional  figures  into  other  arrangements.  The 
process  involves  restructuring  the  components  of  an  object  and  accurately 
discerning  their  appropriate  appearance  in  new  configurations.  This  construct 
includes  several  subcomponents,  two  of  which  are  rotation  and  scanning.  The 
two  tests  developed  to  measure  visual  rotation  ability  are  Assembling  Objects 
and  Object  Rotation,  involving  three-dimensional  and  two-dimensional  objects, 
respectively. 

Assembling  Objects  Test.  This  test  was  designed  to  assess  the  ability 
to  visualize  how  an  object  will  look  when  its  parts  are  put  together 
correctly.  This  measure  was  intended  to  combine  power  and  speed  components, 
with  speed  receiving  greater  emphasis.  Each  item  presents  subjects  with 
components  or  parts  or  an  object.  The  task  is  to  select,  from  among  four 
alternatives,  the  one  object  that  depicts  the  components  or  parts  put  together 
correctly.  Published  tests  identified  as  markers  for  Assembling  Objects 
include  the  Employee  Aptitude  Survey  Space  Visualization  (EAS-5)  and  the 
Flanagan  Industrial  Test  (FIT)  Assembly. 

Object  Rotation  Test.  The  initial  version  contained  60  items  with  a 
7-minute  time  limit. The  subject's  task  is  to  examine  a  test  object  and 
determine  whether  the  figure  represented  in  each  item  is  the  same  as  the  test 
object,  only  rotated,  or  is  not  the  same  as  the  test  object  (e.g.,  flipped 
over).  Published  tests  serving  as  markers  for  the  Object  Rotation  measure 
include  Educational  Testing  Service  (ETS)  Card  Rotations,  Thurstone's  Flags 
Test,  and  Shephard-Metzler  Mental  Rotations. 

Spatial  Visualization  -  Scanning 

The  second  component  of  spatial  visualization  ability  is  spatial 
scanning,  which  requires  the  subject  to  visually  survey  a  complex  field  and 
find  a  pathway  through  it,  utilizing  a  particular  configuration.  The  Path 
Test  and  the  Maze  Test  were  developed  to  measure  this  component. 

Path  Test.  The  Path  Test  requires  subjects  to  determine  the  best  path 
or  route  between  two  points.  Subjects  are  presented  with  a  map  of  airline 
routes  or  flight  paths.  The  subject's  task  is  to  find  the  "best"  path  or  the 
path  between  two  points  that  requires  the  fewest  stops.  Published  tests 
serving  as  markers  for  construction  of  the  Path  Test  include  ETS  Map  Planning 
and  ETS  Choosing  a  Path. 


Maze  Test.  The  first  pilot  test  version  of  the  Maze  Test  contained  24 
rectangular  mazes,  with  four  entrance  points  and  three  exU  points.  The  task 
is  to  determine  which  of  the  four  entrances  leads  to  a  pathway  through  the 
maze  and  to  one  of  the  exit  points.  A  9-minute  limit  was  established. 

Field  Independence 

This  construct  involves  the  ability  to  find  a  simple  form  when  it  is 
hidden  in  a  complex  pattern.  Siven  a  visual  percept  or  configuration,  field 
independence  refers  to  the  ability  to  hold  the  percept  or  configuration  in 
mind  so  as  to  distinguish  it  from  other  well-defined  perceptual  material. 
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Shanes  Test.  The  marker  test  is  ETS  Hidden  Figures.  The  strategy  for 
constructing  the  Shapes  Test  was  to  use  a  task  similar  to  that  in  the  hidden 
Figures  Test  while  ensuring  that  the  difficulty  level  of  test  items  was  geared 
more  toward  the  Project  A  target  population.  The  test  was  to  oe  speeded,  but 
not  nearly  so  much  so  as  the  Hidden  Figures.  At  the  top  of  each  test  page  are 
five  simple  shapes;  below  these  shapes  are  six  complex  figures.  Subjects  are 
Instructed  to  examine  the  simple  shapes  and  then  to  find  the  one  simple  shape 
located  In  each  complex  figure. 

Spatial  Orientation 

This  construct  involves  the  ability  to  maintain  one's  bearings  with 
respect  to  points  on  a  compass  and  to  maintain  location  relative  to  land¬ 
marks.  It  was  not  Included  in  the  list  of  preoictor  constructs  evaluated  by 
the  expert  panel,  but  it  had  proved  useful  durino  World  War  II,  when  the  Army 
Air  Forces  (AAF)  Aviation  Psychology  Program  explored  a  variety  of  measures 
for  selecting  air  crew  personnel.  Also,  during  the  second  year  of  Projec.  A, 
a  number  of  job  observations  suggested  that  some  MOS  Inv'ilve  critical  Job 
requirements  of  maintaining  directional  orientation  and  establishing  location, 
using  features  or  landmarks  in  the  environtnent.  Consequently,  three  different 
measures  of  this  construct  were  formulated. 

Orientation  Test  1.  Direction  Orientation  Form  B  (CP515B)  developed  by 
researchers  in  the  AAF  Aviation  Psychology  Program  served  as  the  marker  for 
Orientation  Test  1.  Each  test  item  presented  subjects  with  six  circles.  In 
the  test's  original  form,  the  first,  or  Given,  circle  indicated  the  compass 
direction  for  North.  For  most  items,  North  was  rotated  out  of  its  conven¬ 
tional  position.  Compass  directions  also  appeared  on  the  remaining  five 
circles.  The  subject's  task  was  to  determine,  for  each  circle,  whether  or  not 
the  direction  indicated  was  correctly  positioned  by  comparing  It  to  the 
direction  of  North  In  the  Given  circle. 

Orientation  Test  2.  Each  item  contains  a  picture  within  a  circular  or 
rectangular  frame. The  bottom  of  the  frame  has  a  circle  with  a  dot  inside  it. 
The  picture  or  scene  is  not  in  an  upright  position.  Tho  task  is  to  mentally 
rotate  the  frame  so  that  the  bottom  of  the  frame  is  posHioned  at  the  bottom 
of  the  picture.  After  doing  so,  one  must  then  determine  where  the  dot  will 
appear  in  the  circle.  The  original  form  of  the  test  contained  24  items,  and  a 
10-minute  time  limit  was  established. 

Orientation  Test  3.  This  test  was  modeled  after  another  spatial 
orientation  test,  Compass  Directions,  developed  in  the  AAF  Aviation  Psychology 
Program.  Orleritation  Test  3  presented  subjects  with  a  map  that  Includes 
various  landmarks  such  as  a  barracks,  a  campsite,  a  forest,  a  lake.  Within 
each  Item,  subjects  are  provided  with  compass  directions  by  indicating  the 
direction  from  one  landmark  to  another,  such  as  "the  forest  Is  North  of  the 
campsite."  Subjects  are  also  informed  of  their  present  location  relative  to 
another  landmark.  Given  this  information,  the  subject  must  determine  whic'' 
direction  to  go  to  reach  yet  another  structure  or  landmark.  For  each  Item, 
new  or  different  compass  directions  are  given. 
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Induction/Fioural  Reasoning 

This  construct  involves  the  ability  to  generate  hypotheses  about 
principles  governing  relationihips  among  several  objects.  Example  measures  of 
induction  include  the  Employee  Aptitude  Survey  Numerical  Reasoning  (EA$-6), 

ETS  Figure  Classification,  Differential  Aptitude  Test  (OAT)  Abstract  Reason¬ 
ing,  Science  Research  Associates  (SRA)  Word  Grouping,  and  Raven's  Progressive 
Matrices.  These  paper-and-per.cil  measures  present  subjects  with  a  series  of 
objects  such  as  figures,  numbers,  or  words.  To  complete  the  task,  subjects 
must  first  determine  the  rule  governing  the  relationship  among  the  objects  and 
then  apply  the  rule  to  identify  the  next  object  in  the  series.  Two  different 
measures  of  the  construct  were  developed  for  Project  A. 

Raasonino  Test  1.  The  plan  was  to  construct  a  test  that  was  similar  to 
the  task  appear ing  in  EAS-6 ,  Numerical  Reasoning,  but  with  one  major  dif¬ 
ference:  Items  would  be  composed  of  figures  rather  than  numbers.  Reasoning 
Test  1  items  present  subjects  with  a  series  of  four  figures;  the  task  is  to 
identify  from  among  five  possible  answers  the  one  figure  that  should  appear 
next  in  the  series. 

Reasoning  Test  2.  The  ETS  Figure  Classification  test,  which  served  as 
the  marker,  requires  subjects  to  identify  similarities  and  differences  among 
groups  of  figures  and  then  to  classify  test  figures  into  those  groups.  Items 
in  Reasoning  Test  2  were  designed  to  involve  only  the  first  task.  The  test 
items  present  five  figures.  Subjects  are  asked  to  determine  which  four 
figures  are  similar  in  some  way,  thereby  identifying  the  one  figure  that 
differs  from  the  others. 


PREDICTOR  DEVELOPMENT;  COMPUTER-ADMINISTERED  TESTS 

There  were  four  phases  of  activities:  (a)  information  gathering  about 
past  and  current  research  in  perceptual/psychomotor  measurement  and  com¬ 
puterized  methods  of  testing  such  abilities;  (b)  construction  of  a  demonstra¬ 
tion  computer  battery;  (c)  selection  of  commercially  available  microprocessors 
and  peripheral  devices,  writing  of  software  for  testing  several  abilities 
using  this  hardware,  and  tryout  of  this  hardware  and  software;  (d)  continued 
development  of  software,  and  tho  design  and  construction  of  a  custom-made 
response  pedestal. 

Compared  to  the  papev-and-pencil  measurement  of  cognitive  abilities, 
computerized  measurement  of  psycnomotor  and  perceptual  abilities  was  in  a 
relatively  primitive  state.  Much  work  had  been  done  in  World  War  II  using 
electromechanical  apparatus,  but  relatively  little  work  had  occurred  since 
then.  Microprocessor  technology  held  out  the  promise  of  improving  measurement 
in  this  area,  but  the  work  was  (and  still  is)  in  its  early  stages. 

Development  of  Response  Pedestal 

Development  of  the  computer-administered  measures  was  in  turn  dependent 
upon  development  of  the  appropriate  hardware  and  software.  The  portable 
microprocessor  selected  for  use  was  modeled  after  the  COMPAQ  but  the  prelimi¬ 
nary  trials  suggested  that  the  use  of  a  keyboard  may  provide  an  unfair 
advantage  to  subjects  who  have  typing  or  data  entry  experience,  so  a  separate 
response  pedestal  was  designed  and  built. 
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This  response  pedestal  is  depicted  in  Figure  2*3.  Note  that  it  contains 
two  joysticks  (one  for  left-handed  and  one  for  right-handed  subjects),  two 
sliding  resistors,  a  dial  for  entering  demographic  data  such  as  age  and  social 
security  number,  two  red  buttons,  three  response  buttons— blue,  yellow,  and 
white— and  four  green  "home"  buttons. 

To  begin  a  trial,  the  subjects  must  place  their  hands  on  the  four  green 
buttons.  After  the  stimulus . appears  on  the  screen  and  the  subject  has 
determined  the  correct  response,  he  or  she  must  remove  the  preferred  hand  from 
the  "home"  buttons  and  press  the  correct  response  button.  The  "home"  buttons 
serve  two  purposes.  First,  control  is  added  over  the  location  of  the  hands 
while  the  stimulus  item  is  presented.  Second,  procedures  involving  these 
buttons  are  designed  to  assess  two  theoretically  important  components  of 
reaction  time  measures— decision  time  and  movement  time. 


FIgurt  2*3.  RmtponM  pmdtttal  for  eomputmrizod  ftttm. 
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This  construct  Involves  speed  of  reaction  to  stimuli ••that  is,  the  speed 
with  which  a  person  perceives  the  stimulus  independent  of  any  time  taken  by 
the  motor  response  component  of  the  classic  reaction  time  measures.  It  is 
intended  to  be  an  Indicator  of  processing  efficiency  and  includes  both  simple 
and  choice  reaction  time. 

Simple  Reaction  Timet  RT  Test  1.  The  basic  paradigm  for  this  task 
stems  from  Jensen's  research  involving  the  relationship  between  reaction  time 
and  mental  ability  (Jensen,  1982).  On  the  computer  screen,  a  small  box 
appears.  After  a  delay  period  (ranging  from  1.5  to  3.0  seconds)  the  word 
YELLOW  appears  in  the  box.  The  subject  must  remove  the  preferred  hand  from 
the  "home"  buttons  to  strike  the  yellow  key.  The  subject  must  then  return 
both  hands  to  the  ready  position  to  receive  the  next  item. 

Choice  Reaction  Time;  RT  Test  2.  Reaction  time  for  two  response 
alternatives  is  obtained  by  presenting  the  word  BLUE  or  WHITE  on  the  screen. 
The  subjects  are  instructed  that,  when  one  of  these  appears,  they  are  to  move 
the  preferred  hand  from  the  "home"  keys  to  strike  the  key  that  corresponds 
with  the  word  appearing  on  the  screen  (BLUE  or  WHITE). 


This  construct  is  defined  as  the  rate  at  which  one  observes,  searches, 
and  recalls  information  contained  in  short-term  memory. 

Memory  Search  Test.  The  marker  was  a  short-term  memory  search  task 


memory  searen  Test,  me  marxer  was  a  snort-term  memory  search  task 
introduced  by  S.  Sternberg  (1966,  1969)  and  the  measure  developed  for  Project 
A  is  similar.  The  first  stimulus  set  appears  and  contains  one,  two,  three, 
four,  or  five  objects  (letters).  Following  a  display  period  of  0.5  or  1.0 
second,  the  stimulus  set  disappears  and,  after  a  delay,  the  probe  item 
appears.  Presentation  of  the  probe  item  is  delayed  by  either  2.5  or  3.0 
seconds  and  the  subject  must  then  decide  whether  or  not  it  appeared  in  the 
stitnulus  set.  If  the  item  was  present  in  the  stimulus  set,  the  subject 
strikes  the  white  key.  If  the  probe  item  was  not  present,  the  subject  strikes 
the  blue  key. 


Parameters  of  interest  include  the  number  of  letters  in  *he  stimulus 
set,  length  of  observation  period,  probe  delay  period,  and  probe  status  (i.e., 
the  probe  is  either  jji  the  stimulus  or  not  in  the  stimulus  set).  Subjects 
receive  scores  on  the  following  measures: 


The  Slope  and  Intercept  obtained  by  regrossing  mean  total  reaction 
time  (correct  responses  only)  against  item  length.  In  terms  of 
processing  efficiency,  the  slope  represents  the  average  Increase  in 
reaction  tine  with  an  increase  of  one  object  in  the  stimulus  set. 
The  Intercept  represents  all  other  processes  not  involved  in  memory 
search,  such  as  encoding  the  probe,  determining  whether  or  not  a 
match  has  been  found,  and  executing  the  response. 
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Percent  Correct  scores,  used  to  Identify  subjects  performing  at 
very  low  levels  which  would  preclude  computation  of  the  above 
scores. 

The  Grand  Hean  obtained  by  calculating  the  mean  cf  the  mean  reaction 
time  (correct  responses  only)  for  each  level  of  stimulus  set  length 
(i.e.,  one  to  five). 

Pcrctttiiial  .Sated  and  Accucagy 

Perceptual  speed  and  accuracy  involves  the  ability  to  perceive  visual 
information  quickly  and  accurately  and  to  perform  simple  processing  tasks  with 
the  stimulus  (e.g.,  make  comparisons).  This  requires  the  ability  to  make 
rapid  scanning  movements  without  bein':  distracted  by  irrelevant  visual 
stimuli,  and  measures  memory,  working  •peed,  and  sometimes  eye-hand  coor¬ 
dination. 

Perceptual  Speed  and  Accuracy  Test.  Measures  used  as  markers  for  the 
development  of  the  computerized  Perceptual  Speed  and  Accuracy  (PS&A)  Test 
included  the  Employee  Aptitude  Survey  Visual  Speed  and  Accuracy  (EAS-4)  and 
the  ASVAB  Coding  Speed.  The  computer-administered  Perceptual  Speed  and 
Accuracy  Test  requires  the  ability  to  make  a  rapid  comparison  two  visual 
stimuli  presented  simultaneously  and  determine  whether  they  are  the  same  or 
different.  Five  different  types  of  stimuli  are  presented:  alpha,  numeric, 
symbolic,  mixed,  and  word.  Within  the  alpha,  numeric,  symbolic,  and  mixed 
stimuli,  the  character  length  of  the  stimulus  is  varied.  Four  levels  of 
character  stimulus  length  are  present:  two,  five,  seven,  and  nine. 

Target  Identification  Test.  In  this  test,  each  item  shows  a  target 
object  near  the  top  of  the  screen  and  three  color-labeled  stimuli  in  a  row 
near  the  bottom  of  the  screen.  Examples  are  shown  in  Figure  2-4.  The  subject 
is  to  identify  which  of  the  three  stimuli  represents  the  same  object  as  the 
target  and  to  press,  as  quickly  as  possible,  the  button  (blue,  yellow,  or 
white)  that  corresponds  to  that  object.  The  objects  shown  are  based  on 
military  vehicles  and  aircraft  as  shown  on  the  standard  set  of  flashcards  used 
to  train  soldiers  to  recognize  equipment  presently  being  used  by  various 
nations.  Several  parameters  were  varied  in  the  stimulus  presentation.  In 
addition  to  type  of  object,  the  position  of  the  correct  response  (left  or 
right  side  of  the  screen),  the  orientation  of  the  target  object  (facing  in  the 
same  direction  as  the  stimuli  or  in  the  opposite  direction),  variation  in  the 
angle  of  rotation  (from  horizontal)  of  the  target  object,  and  the  size  of  the 
target  object  were  incorporated  into  the  test. 

Psvchomotor  Precision 

This  construct  reflects  the  ability  to  make  the  muscular  movements 
necessary  to  adjust  or  position  a  machine  control  mechanism.  The  ability 
applies  both  to  anticipatory  movements  where  the  stimulus  condition  is 
continuously  changing  in  an  unpredictable  manner  and  to  controlled  movements 
where  stimulus  conditions  change  in  a  predictable  fashion.  Psychomotor  pre¬ 
cision  thus  encompasses  two  of  the  ability  constructs  identified  by  Fleishman 
and  his  associates,  control  precision  and  rate  control  (Fleishman,  1967). 
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EXAMPU 1. 


TAROET 


TARGET 


BLUE  YELLOW  WHITE 


Plgur«  2-4.  Qrtphic  dltplayi  of  txamplo  Homt  from  tho  eompuftr- 
■dminlitorod  Targof  IdantHleatlon  Toot 


25 


Performance  on  tracking  tasks  Is  very  likely  related  to  psychomotor 
precision  and,  since  tracking  tasks  are  an  important  part  of  many  Army  MOS, 
development  of  psychomotor  precision  tests  was  made  a  high  priority.  The 
Initial .computer  battery  Included  two  measures  of  this  ability. 

Target  Tracking  Test  1.  This  test  was  designed  to  measure  control 
precision,  and  the  AAF  Rotary  Pursuit  Test  served  as  a  model.  For  each  trial, 
subjects  are  shown  a  path  consisting  entirely  of  vertical  and  horizontal  line 
segments.  At  the  beginning  of  the  path  Is  a  target  box,  and  centered  In  the 
box  are  crosshairs.  As  the  trial  begins,  the  target  starts  to  move  along  the 
path  at  a  constant  rate  of  speed.  The  subject's  task  is  to  keep  the 
crosshairs  centered  within  tne  target  at  all  times.  The  subject  uses  a 
Joystick,  controlled  with  one  hand,  to  control  movement  of  the  crosshairs. 

Item  parameters  Include  the  speed  of  the  crosshairs,  the  maximum  speed 
of  the  target,  the  difference  between  crosshairs  and  target  speeds,  the  total 
length  of  the  path,  the  number  of  line  segments  comprising  the  path,  and  the 
average  amount  of  time  the  target  spends  traveling  along  each  segment. 

Two  kinds  of  scores  were  Investigatedi  (a)  tracking  accuracy  and  (b) 
Improvement  In  tracking  performance.  Two  accuracy  measures  were  Investigated, 
time  on  target  and  distance  from  the  center  of  crosshairs  to  the  center  of  the 
target.  The  test  program  computes  the  distance  from  the  crosshairs  to  the 
center  of  the  target  several  times  each  second,  and  then  averages  these 
distances  to  derive  an  overall  accuracy  score  for  that  trial.  Subsequently, 
to  remove  positive  skew,  each  trial  score  was  transformed  by  taking  the  square 
root  of  the  average  distance.  These  trial  scores  were  then  averaged  to 
determine  an  overall  tracking  accuracy  score. 

Target  Shoot  Test.  This  test  was  modeled  after  several  compensatory  and 
pursuit  tracking  tests  used  by  the  AAF  In  the  Aviation  Psychology  Program 
(e.g.,  the  Rate  Control  Test).  For  the  Target  Shoot  Test,  a  target  box  and  a 
crosshairs  appear  In  different  locations  on  the  computer  screen.  The  target 
moves  about  the  screen  In  an  unpredictable  manner,  frequently  changing  speed 
and  direction.  The  subject  controls  movement  of  the  crosshairs  via  a  Joystick 
and  the  task  Is  to  move  the  crosshairs  Into  the  center  of  the  target,  and  to 
"fire"  at  the  target.  The  score  Is  the  distance  from  the  center  of  the 
crosshairs  to  the  center  of  the  target. 

Several  Item  parameters  were  varied  from  trial  to  trial.  Including  the 
maximum  speed  of  the  crosshairs,  the  average  speed  of  the  target,  the 
difference  between  crosshairs  and  target  speeds,  the  number  of  changes  In 
target  speed  (If  any),  the  number  of  line  segments  comprising  t.he  path  of  each 
target,  and  the  average  amount  of  time  required  for  the  target  to  travel  each 
segment. 

Three  scores  were  obtained  for  each  trial.  Two  were  measures  of 
accuracyi  (a)  the  distance  from  the  center  of  the  crosshairs  to  the  center 
of  the  target  at  the  time  of  firing,  and  (b)  whether  the. subject  "hit"  or 
"missed"  the  target.  The  third  score  reflected  speed  and  was  measured  by  the 
time  from  trial  onset  until  the  subject  fired  at  the  target. 
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Multi  limb  Coordination 

This  ability  does  jisi  apply  to  tasks  in  which  trunk  movement  must  be 
integrated  with  limb  movements.  It  refers  to  tasks  whore  the  body  is  at  rest 
(e.g.,  seated  or  standing)  while  two  or  more  limbs  are  in  motion. 


Target  packing  Test  2.  This  test  is  very  similar  to  the  Two-Hand 
Coordination  test  developed  by  the  AAF.  For  each  trial  subjects  are  shown  a 
path  consisting  entirely  of  vertical  and  horizontal  lines.  At  the  beginning 
of  the  path  is  a  target  box,  and  centered  in  the  box  are  crosshairs.  As  the 
trial  begins,  the  target  starts  to  move  a  Hug  the  path  at  a  constant  rate  of 
speed.  The  subject  manipulates  tw.'^  sliding  resistors  to  control  movement  of 
tne  crosshairs.  One  resistor  controls  movement  in  the  horizontal  plane,  the 
other  in  the  vertical  plane.  The  subject's  task  is  to  keep  the  crosshairs 
centered  within  the  target  at  times.  This  test  and  Target  Tracking  Test  1 
are  virtually  identical  except  tor  the  nature  of  the  required  control  mani¬ 
pulation. 

Nymbcr,  OBgrallana 

This  construct  involves  the  ability  to  perform,  quickly  and  accurately, 
simple  arithmetic  operations  such  as  addition,  subtraction,  multiplication, 
and  division. 

»r  Memory  Test.  This  test  was  modeled  after  a  number  memory  test 
y  Or.  Raymond  Christal  at  the  Air  Force  Human  Resources  Laboratory. 
Subjects  are  presented  with  a  single  number  on  the  computer  screen.  After 
studying  the  number,  the  subject  is  instructed  to  push  a  button  to  receive  the 
next  part  of  the  problem.  When  the  button  is  pressed,  the  first  part  of  the 
problem  disappears  and  another  number,  along  with  an  operation  term  such  as 
Add  9  or  Subtract  6  then  appears.  Once  the  subject  has  combined  the  first 
number  with  the  second,  he  or  she  must  press  another  button  to  receive  the 
third  part  of  the  problem.  This  procedure  continues  until  a  solution  to  the 
problem  is  presented.  The  subject  must  then  indicate  whether  the  solution 
presented  is  right  or  wrong.  Test  items  vary  with  respect  to  number  of  parts- 
-four,  six,  or  eiaht— contained  in  the  single  item,  and  the  interstimulus 
delay  period.  This  test  is  not  a  "pure"  measure  of  number  operations,  since 
it  also  is  designed  to  bring  short-term  memory  into  play. 

Movement  Judgment 

Movement  judgment  is  the  ability  to  judge  the  relative  speed  and 
direction  of  one  or  more  moving  objects  to  determine  where  those  objects  will 
be  at  a  given  point  in  time  and/or  when  those  objects  might  intersect. 

Cannon  Shoot  Test.  The  Cannon  Shoot  Test  measures  subjects'  ability  to 
fire  at  a  moving  target  in  such  a  way  that  the  shell  hits  the  target  when  the 
target  crosses  the  cannon's  line  of  fire.  At  the  beginning  of  each  trial,  a 
stationary  cannon  appears  on  the  video  screen;  the  starting  position  varies 
from  trial  to  trial.  The  cannon  is  "capable"  of  firing  a  shell,  which  travels 
at  a  constant  speed  on  each  trial.  Shortly  qfter  the  cannon  appears,  a 
circular  target  moves  onto  the  screen.  This  target  moves  in  a  constant 
direction  at  a  constant  rate  of  speed  throughout  the  trial,  though  the  speed 
and  direction  vary  from  trial  to  trial.  The  subject's  task  is  to  push  a 


27 


response  button  to  fire  the  shell  so  thet  the  shell  Intersects  the  target  when 
the  target  crosses  the  shell's  line  of  fire. 

Three  parameters  determine  the  nature  of  each  test  trial:  the  angle  of 
the  target  movement  relative  to  the  position  of  the  cannon,  the  distance  from 
the  cannon  to  the  impact  point,  and  the  distance  from  impact  point  to  fire 
point. 

PREDICTOR  DEVELOPMENT:  NON-COGNITIVE  MEASURES 

Two  non-cognitive  paper-and-pencil  inventories  were  developed  for  the 
Pilot  Trial  Battery.  The  ABLE  (Assessment  of  Background  and  Life  Experiences) 
contains  items  that  assess  the  high-priority  constructs  in  the  personality/ 
temperament  and  life  history  (biodata)  domains.  The  AVOICE  (Army  Vocational 
Interest  Career  Examination)  measures  relevant  constructs  pertaining  to 
vocational  interests. 

The  extensive  literature  or.  temperament,  interest,  and  biographical 
data,  the  results  of  the  expert  Judgment  study,  and  the  covariance  matrix  from 
the  preliminary  battery  were  examined  and  discussed  at  some  length  in  a  series 
of  meetings  attended  by  the  relevant  project  staff  and  members  of  the  Scien¬ 
tific  Advisory  Group.  The  result  of  these  deliberations  was  an  array  of 
constructs  that  were  Judged  to  be  the  best  potential  sources  of  valid  selec¬ 
tion/classification  information  of  a  non-cognitive  nature.  The  linkages  among 
the  initial  variable  array,  the  constructs  chosen  for  measurement,  the 
variables  proposed  to  reflect  them,  and  the  forecasted  predictor/cr iter ion 
correlations  are  shown  in  Figure  2-5  (Hough,  1984). 

Ihe  Temperanient  and  Biooraohical  Measures  (ABLE) 

Following  the  identification  of  the  construct  array,  item  writing  groups 
were  created  and  items  were  written,  revised,  edited,  and  arranged  into 
specific  temperament  and  biographical  scales  that  were  intended  to  be  valid 
measures  ot‘  the  chosen  constructs.  After  this  initial  phase  of  item  writing, 
revision,  vnd  scale  creation,  11  substantive  scales  and  four  response  bias 
scales  ware  produced.  Table  2-1  lists  the  seven  constructs  initially  chosen 
for  measuronent  via  the  ABLE,  the  11  scales  subsequently  developed  to  repre¬ 
sent  them,  and  four  validity  scales  developed  under  Project  A.  Each  construct 
is  briefly  explained  below. 

Constructs 

Adjustment.  Adjustment  is  defined  as  the  amount  of  emotional  stability 
and  stress  tolerance  that  one  possesses.  The  well-adjusted  person  is  general¬ 
ly  calm,  displays  an  even  mood,  and  is  not  overly  distraught  by  stressful 
situations.  He  or  she  thinks  clearly  and  maintains  composure  and  rationality 
in  situations  of  actual  and  perceived  stress.  The  poorly  adjusted  person  is 
nervous,  moody,  and  easily  irritated,  tends  to  worry  a  lot,  and  does  not  do 
well  in  times  of  stress. 
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Flflura  2-1.  UnktQM  bttWMn  llt«niur«  nivitw,  axp«i1  )udgni«nti,  and 
Prattminary  and  Trial  Battary  on  Non-Coflnldva  Maaauraa. 
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Table  2-1 


TwaparaoMnt/ Biodata  Scales  (by  Construct)  Developed  for  Pilot  Trial  Battery: 
ABLE-AssessMnt  of  Background  and  Life  Experiences 


CQiatrust 

Scale 

Adjustment 

Emotional  Stability 

Dependability 

Nondelinquency 
Traditional  Values 
Conscientiousness 

Achievement 

Work  Orientation 
Self-Esteem 

Physical  Condition 

Physical  Condition 

Leadership  (Potency) 

Dominance 

Energy  Level 

Locus  of  Control 

Internal  Control 

Agreeableness/Likablllty 

Cooperativeness 

Response  Validity  Scales 

Non-Random  Response 
Unlikely  Virtues  (Social 
Desirability) 

Poor  Impression 
Self-Knowledge 

Dependably tv.  The  Dependability  construct  refers  to  a  person's 
characteristic  degree  of  conscientiousness.  The  dependable  person  Is  dis¬ 
ciplined,  we 11 -organized,  planful,  respectful  of  laws  and  regulations,  honest, 
trustworthy,  wholesome,  and  accepting  of  authority.  Such  a  person  prefers 
order  and  thinks  before  acting.  The  less  dependable  person  Is  unreliable, 
acts  on  the  spur  of  the  moment,  and  Is  rebellious  and  contemptuous  of  laws  and 
regulations. 

Achievement.  Achievement  is  defined  as  the  tendency  to  strive  for 
competence  in  one's  work.  The  achievement/work -oriented  person  works  hard, 
sets  high  standards,  tries  to  do  a  good  job,  endorses  the  work  ethic,  and 
concentrates  on  and  persists  In  completion  of  the  task  at  hand.  This  person 
Is  also  confident,  feels  success  from  past  undertakings,  and  expects  to 
succeed  In  the  future.  The  person  who  Is  less  achievement-oriented  has  little 
ego  Involvement  In  his  or  her  work,  feels  Incapable  and  self-doubting,  does 
not  expend  undue  effort,  and  does  not  feel  that  hard  work  Is  desirable. 

Physical  Condition.  The  Physical  Condition  construct  refers  to  one's 
frequency  and  degree  of  participation  In  sports,  exercise,  and  physical 
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Lgadership  (Potency^.  This  construct  was  defined  as  the  degree  of 
Impact,  inf luence,  end  energy  that  one  displays.  The  person  high  on  this 
characteristic  is  appropriately  forceful  and  persuasive,  Is  optimistic  and 
vital,  and  "gets  things  done.**  The  person  low  on  this  characteristic  is  timid 
about  offering  opinions  or  providing  direction  and  is  likely  to  be  lethargic 
and  pessimistic. 

Locus  of  Control.  Locus  of  Control  refers  to  one's  characteristic 
be 1 ief  in  the  amount  of  control  one  has  or  people  have  over  rewards  and 
punishments.  The  person  with  an  internal  locus  of  control  expects  that  there 
are  consequences  associated  with  behavior  and  that  people  control  what  happens 
to  them  by  what  they  do.  Persons  with  an  external  locus  of  control  believe 
that  what  happens  is  beyond  their  personal  control. 

Agreeableness/Likabllity  is  defined  as  the 
degree  of  pleasantness  versus  unpleasantness  exhibited  in  interpersonal 
relations.  The  high-scoring  person  is  pleasant,  tolerant,  tactful,  helpful, 
not  defensive,  and  generally  easy  to  get  along  with.  His  or  her  participa¬ 
tion  in  a  group  adds  cohesiveness  rather  than  friction.  The  low-scoring 
person  is  crUical,  fault-finding,  touchy,  defensive,  alienated,  and  generally 
contrary. 

yflUdUv  SMiga 

The  primary  purpose  of  these  scales  is  to  determine  the  validity  of 
responses,  that  is,  the  degree  to  which  the  responses  are  accurate  depictions 
of  the  person  completing  the  inventory. 

Non-Random  Response.  The  content  (8  items)  asks  about  information  that 
any  person  is  virtually  certain  to  know. 

Unlikely  Virtue^.  This  12-itera  scale  is  aimed  at  detecting  those  who 
respond  in  a  socially  desirable  manner  (i.e.,  “fake  good"). 

Poor  Impression.  This  was  an  empirically  derived  scale  designed  to 
detect  people  attempting  to  "fake  bad." 

Self-Knowledge.  This  13-item  scale  is  intended  to  identify  people  who 
are  more  self-aware,  more  insightful,  and  more  likely  to  have  accurate 
perceptions  about  themselves. 


The  Interest  ConstructsyScales  (AVOICE) 

The  Vocational  Interest  Career  Examination  was  originally  developed  by 
the  Air  Force.  That  inventory  served  as  the  starting  point  for  the  AVOICE 
(Army  Vocational  Interest  Career  Examination).  The  intent  for  the  AVOICE  was 
to  measure  all  six  of  the  constructs  identified  in  Holland's  (1966)  hexagonal 
model  of  interest,  as  well  as  to  provide  sufficient  coverage  of  the  vocational 
areas  most  important  in  the  Army.  The  six  Interest  constructs  assessed  by  the 
AVOICE,  together  with  their  associated  scales,  are  shown  in  Table  2-2.  The 
Basic  Interest  item,  one  of  which  is  written  for  each  Holland  coiiStruct, 
describes  a  person  with  prototypic  interests.  The  respondent  indicates  how 
well  this  description  fits  him  or  her. 
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T«b1a  2-2 

HolUnd  Basic  Zntarcst  Constructs,  and  Ar^y  Vocational  Intorest  Career 
Examination  (AVOICE)  Scales  Developed  for  Pilot  Trial  Battery 


Cgnstoct 

Realistic 


Conventional 


Social 


Investigative 


Enterprising 


Artistic 


Scale 

Basic  Interest  Item 
Mechanics 

Heavy  Construction 
Electronics 

Electronic  Conmunlcatlon 

Drafting 

Law  Enforcement 

Audiographics 

Agriculture 

Outdoors 

Marksman 

Infantry 

Armor/Cannon 

Vehicle  Operator 

Adventure 

Basic  Interest  Item 
Office  Administration 
Supply  Administration 
Food  Service 

Basic  Interest  Item 
Teaching/Counseling 

Basic  Interest  Item 
Medical  Services 
Mathematics 
Science/Chemical 
Automated  Data  Processing 

Basic  Interest  Item 
Leadership 

Basic  Interest  Item 
Aesthetics 
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In  addUion,  the  AVOICE  Included  six  scales  dealing  with  organizational 
climate  and  environment  and  an  expressed  Interests  scale.  The  six  constructs 
that  pertain  to  a  person's  preference  for  certain  types  of  work  environments 
and  conditions  are  assessed  by  the  AVOICE  through  2C  scales  of  two  items  each. 
Figure  2-6  shows  the  constructs,  scales,  and  an  item  from  each  scale. 


Construct/Scale 

Achievement 

Achievement 

Authority 

Ability 

Utilization 


Safety 

Organizational 
Policy 
Supervision- 
Human  Resources 
Supervision- 
Technical 


Comfort 

Activity 

Variety 

Compensation 

Security 

Working  Conditions 


Status 

Advancement 
Recognition 
Social  Status 

Altruism 

Co-workers 

Moral  Values 

Social  Services 

Autonomy 

Responsibility 

Creativity 

Independence 


Example 


"Do  work  that  gives  a  feeling  of 
accomplishment." 

"Tell  others  what  to  do  on  the  Job." 
"Make  full  use  of  your  abilities." 


"A  Job  In  which  the  rules  are  not  equal  for 
everyone." 

"Have  a  boss  that  supports  the  workers." 
"Learn  the  job  on  your  own." 


"Work  on  a  Job  that  keeps  a  person  busy." 
“Oo  something  different  most  days  at  work." 
"Earn  less  than  others  do." 

"A  Job  with  steady  employment." 

"Have  a  pleasant  place  to  work." 


"Be  able  to  be  promoted  ouickly." 

"Receive  awards  or  compliments  on  the  Job." 
"A  Job  that  does  not  stand  out  from  others." 


"A  Job  in  which  other  employees  were  hard  to 
get  to  know." 

"Have  a  Job  that  would  not  bother  a  person's 
conscience." 

"Serve  others  through  your  work." 


"Have  work  decision  made  by  others." 
"Try  out  your  own  ides  on  the  Job." 
"Work  alone." 


Figure  2-6.  AVOICE  organizational  ellmate/envlronment  constructs, 
scales  within  constructs,  and  an  Item  from  each  scale. 
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Although  not  a  psychological  construct,  expressed  Interests  were 
included  In  the  AVOICE  because  of  the  extensive  research  indicating  their 
va11:iity  in  criterion-related  studies.  This  Expressed  Interests  scale 
cont&ined  eight  items  which  had  three  response  options  that  formed  a  continuum 
of  confidrnce  in  the  person's  occupational  cholco.  Items  from  this  scale 
include:  "Before  you  went  to  the  recruiter,  how  certain  were  you  of  the  Job 
you  wanted  In  the  Army?",  and  "If  you  had  the  opportunity  right  now  to  change 
your  Job  In  the  Amy,  would  you?" 

As  used  In  the  pilot  testing,  the  AVOICE  Included  306  items.  Nearly  all 
items  were  scored  on  a  S-point  scale  that  ranged  from  "Like  Very  Much"  (scored 
5)  to  "Dislike  Very  Much"  (scored  1).  Items  in  the  Expressed  Interests  scale 
were  scored  on  a  3-po1nt  scale  in  which  the  response  options  were  different 
for  each  item,  yet  one  option  always  reflected  the  most  Interest,  one  moderate 
interest,  and  one  the  least  interest. 

Sv»iiiifli:i.<itJtgnrCoqnUlYB  McasgrBS 

The  two  non-cognitive  inventories  of  the  Pilot  Trial  Battery,  the  ABLE 
and  the  AVOICE,  were  designed  to  measure  a  total  of  20  constructs  plus  a 
validity  scale  category.  The  ABLE  assessed  six  temperament  constructs  and  the 
Physical  Condition  construct  through  11  scales,  and  also  included  four 
validity  scales.  Altogether,  the  46  scales  of  the  inventories  included 
approximately  600  items. 

The  psychometric  data  obtained  in  pilot  tests  with  both  inventories 
seemed  highly  satisfactory;  the  scales  were  shown  to  be  reliable  and  appeared 
to  be  measuring  the  constructs  intended. 

PILOT  AND  FIELD  TESTS  OF  THE  PILOT  TRIAL  PREDICTOR  BATTERY 

Initial  Pilot  Tests 

Each  instrument  in  each  category  (cognitive  paper-and-pencil ,  com¬ 
puterized,  and  non-cognitive)  was  pilot  tested  one  or  more  times  with  various 
small  samples  from  Fort  Campbell,  Fort  Carson,  and  Fort  Lewis.  Based  on 
feedback  from  the  respondents,  refinements  were  made  in  directions,  format, 
and  item  wording.  A  few  items  were  dropped  because  of  extreme  item  statis¬ 
tics.  However,  the  basic  structure  of  each  instrument  remained  the  same  until 
more  data  from  the  larger  scale  field  tests  became  available. 

Field  Tests 

The  final  step  before  the  Concurrent  Validation  was  a  more  systematic 
series  of  field  tests  c.f  ail  the  predictor  measures,  using  larger  samples. 

The  outcome  of  the  field  test/revision  process  was  the  final  form  of  the 
predictor  battery  (i.e.,  the  Trial  Battery)  to  be  used  in  the  Concurrent 
Validation. 

Field  tests  were  conducted  at  three  sites.  The  sites  and  basic  purposes 
of  the  field  test  at  each  site  ars  described  belcw. 

Fort  Knox  -  The  full  Pilot  Trial  Battery  (PT3)  was  administeriid  here  to 
evaluate  the  psychometric  characteristics  of  all  the  measures  and  to  analyze 
the  covariance  of  the  measures  with  each  other  and  with  the  ASVAB.  In 
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addition,  the  measures  were  readministered  to  part  of  the  sample  to  provide 
data  for  estimating  the  test-retest  reiiability  of  the  measures.  Finally, 
part  of  the  sample  received  practice  on  some  of  the  computer  measures  and  were 
then  retested  to  obtain  an  estimate  of  the  effects  of  practice  on  scores  on 
computer  measures. 

Fort  Braoa  -  The  non-cognitive  Pilot  Trial  Battery  measures.  Assessment 
of  Background  and  Life  Experiences  (ABLE)  and  Army  Vocational  Interest  Career 
Examination  (AVOICE),  were  administered  to  soldiers  at  Fort  Bragg  under 
several  experimental  conditions  to  estimate  the  extent  to  which  scores  on 
these  instruments  could  be  altered  or  "faked"  when  persons  are  instructed  to 
do  so. 


Minneapolis  Military  Entrance  Processing  Station  -  ”he  non-cognitive 
measures  were  administered  to  a  sample  of  recruits  as  they  were  being  proces¬ 
sed  into  the  Army,  to  obtain  an  estimate  of  hew  persons  in  an  applicant 
setting  might  alter  their  scores. 


Results  of  the  Coorritive  Paper. nrrd-Pencil  and 
Computer-Administered  Fieid  Tests 


Psychometric  Data 

The  basic  data  obtained  on  the  cognitive  paper-and-pencil  and  the 
computer-admin 1::tered  teits  are  portrayed  in  Tables  2-3  and  2-4,  respectively. 

Factor  Analysis  Results 

Two  variables,  PS&A  reaction  time  and  Short-Term  Memory  reaction  time, 
were  omitted  because  the  reaction  time  scores  from  these  measures  correlated 
very  highly  with  their  corresponding  slope  or  intercept  variables.  Results 
from  the  seven-factor  solution  of  a  principal  components  factor  analysis  with 
varimax  rotation  are  displayed  in  Table  2-5.  All  loadings  of  .30  or  greater 
are  shown. 

Factor  1  includes  eight  of  the  ASVAB  subtests,  six  of  the  paper-and- 
poncil  measures,  and  two  cognitive/perceptual  computer  variables.  Because 
this  factor  contains  measures  of  verbal,  numerical,  and  reasoning  ability,  it 
was  termed  "g",  to  represent  general  cognitive  ability. 

Factor  2  was  a  general  spatial  factor  and  included  all  of  the  PTB 
cognitive  paper-and-penci *  measures.  Mechanical  Comprehension  from  the  ASVAB, 
and  Target  Identification  reaction  time  from  the  computer  tests. 

Factor  3  loaded  on  the  three  psychomotor  tests,  with  substantially 
smaller  loadings  from  three  coynitive/perceptual  computer  test  variables,  the 
Path  Test,  and  Mechanical  Comprehension  from  the  ASVAB.  Given  the  high 
loadings  of  the  psychomotor  tests,  it  was  labeled  the  motor  factor. 

Factor  4  included  variables  from  the  cognitive/oerceptual  comput&r 
tests. This  factor  appears  to  involve  accuracy  of  perception  across  several 
tasks  and  types  of  stimuli. 

For  Factor  5.  the  highest  loadings  were  on  straightforward  reaction  time 
measures.  Consequently,  it  was  interpreted  as  a  speed  of  reaction  factor. 
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Utloas,  and  Reliability  Estiaates  for  tbe  Tea  Paper-aad-Rencil  Cogaitive  Tests; 
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range  froa  292  tr  296  for  aean  and  SO  calculations. 

^he  split-half  coefficient  is  ccaputed  on  pilot  test  data  froa  Fort  Lenis,  ahere  two  separately  tiaed 
halves  »*ere  given,  and  is  correct^  to  full  test  length.  Coefficient  alphas  are  based  or  the  Fort  Knox 
data  and  are  overestiaates  for  the  speeded  tests.  The  test-retest  interval  was  two  weeks. 


Ttbit  2-4 


Chtrtcttrl sties  of  tho  19  Doptndont  Notsurts  for  Coaputor-Adalnistirod  Tosts: 
Fort  Knox  Flold  Tosts 


.  - _ Slightly  . . 

N  •  120  for  tost-rotfst  reliabilities,  but  varies  slightly  from  lest  to 
test,  r^  •  spHt-half  reliability;  odd-even  Item  correlation  with 
Spearman-Brown  correction.  r„  ■  test-retest  reliability,  2»week 
Interval  between  administrations. 

*  hs  •  hundredths  of  a  second. 
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Ub1«  2-5 


Principal  CoMponcnts  Factor  Analysis  of  Scores  of  tho  ASVAB  Subtosts,  Cognitive 
Paper-and-Penc11  Measures,  and  PorcaptuaH/PsychoMtor  Coaputer- 
Adolnlstered  Tests*  (N  <»  169) 


Factor  Factor 

„  L- 


Factar  Factor  Factor  Factor  Factor 


a 


ASVAB 

GS 

75 

59 

AR 

75 

73 

WK 

77 

62 

PC 

62 

47 

NO 

84 

77 

CS 

62 

44 

AS 

62 

58 

MK 

77  • 

70 

MC 

63 

38 

-30 

68 

El 

72 

65 

COGNITIVE  PAPER-ANO-PENCIL 

Assemb  ObJ 

35 

69 

66 

Obj  Rotation 

-61 

49 

Shapes 

66 

51 

Maze 

70 

67 

Path 

67 

-30 

65 

Reason  1 

37 

58 

54 

Reason  2 

37 

47 

44 

Orient  1 

37 

64 

58 

Orient  2 

40 

46 

-30 

52 

Orient  3 

60 

52 

67 

PERCEPTUAL  COMPUTER 
SRT-RT 
CRT-RT 
PS4A-PC 
PS&A  Slope 
PSiA  Inter 
Target  lO-PC 

Target  lO-RT  -41 

STH-PC 

STM-SIope 

STh-Int 

Cannon  Shoot-TE 
No  Mem-PC  S3 

No  Mem-RT  -37 

PSYCHOHOTOR  COMPUTER 
Tracking  1 
Tracking  2 
Target  Shoot-TF 
Target  Shoot-Dlst 


67 

63 

61 

31 

88 

-65 

50 

37 

40 

30 

38 

39 

51 

32 

37 

-46 


86 

77 

42 

64 

2.83 

2.37 

1.92 

1.87 

Eigen 


5.69 


4.70 


•ail  k«va  bNn  oKitttd  fro*  factor  loadlngt. 


warlablaa  tora  not  (neluOad  in  tnU  factor  analyilii 
ATllT.  aSiA  aaactlon  Tiai,  and  Short-Tana  Haacry  Raaetton  Tina. 

h*  -  eoanunar.ty  (auai  of  iquamd  factor  loadlngi)  for  varlablai. 


44 

50 

70 

81 

74 

25 

57 

34  41 

41  25 

47 
19 
52 
54 


82 

66 

23 

48 


1.17 
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Factor  6  contained  four  variables,  two  from  the  ASVAB  and  two  from  the 
cogn 1 t i ve/perceptua 1  computer  tests.  This  factor  appears  to  represent  both 
speed  of  reaction  and  arithmetic  ability. 

Factor  7  contains  three  variables  from  the  computer  tests.  These 
Include  Short>Term  Hemory  percent  correct  and  slope,  and  Target  Shoot  tlme-to- 
flre.  This  factor  Is  difficult  to  Interpret,  but  was  believed  to  represent  a 
response  style  factor.  That  Is,  this  factor  suggests  that  those  Individuals 
who  take  a  longer  time  to  fire  on  the  Target  Shoot  Test  also  tend  to  have 
higher  slopes  on  the  Short-Term  Memory  (lower  processing  speeds  with  Increased 
bits  of  Information)  but  are  more  accurate  or  obtain  higher  percent  correct 
values  on  the  Short-Term  Memory  Test. 

Note  that  several  variables  have  fairly  low  commonalities.  These  may  be 
due  to  relatively  low  score  variance  or  reliability,  but  could  also  be  due  to 
those  variables  having  unique  variance,  at  least  when  factor  analyzed  with 
this  set  of  tests.  This  latter  explanation  was  seen  as  more  highly  plausible 
for  the  Cannon  Shoot  score. 


M’l  I«*riW.TiTirii  III 


Correlations  with  Video  Game-Plavlno  Experience.  Field  test  subjects 
vKere  asked  the  question,  "In  the  last  couple  years,  how  much  have  you  played 
video  games?"  the  five  possible  alternatives  ranged  from  "You  have  never 
played  video  games"  to  "You  have  played  video  games  almost  every  day"  and 
were  given  scores  of  1  to  5,  respectively.  The  mean  was  2.99,  SD  was  1.03  (N 
>  256/,  and  the  test-retest  reliability  was  .71  (N  ■  113). 

The  19  correlations  of  this  Item  with  the  computer  test  scores  ranged 
from  -.01  to  4.27,  with  a  mean  of  .10.  A  correlation  of  .12  Is  significant  at 
alpha  ■  .05.  These  findings  were  Interpreted  as  showing  a  small,  but 
significant,  relationship  of  video  game-playing  experience  to  the  more  "game- 
like"  tests  In  the  battery. 

Practice  Effects  on  Selected  Computer  Test  Scores.  The  results  of  the 
analyses  of  variance  for  the  five  tests  Included  In  the  practice  effects 
research  (Table  2-6)  show  only  one  statistically  significant  practice  effect, 
the  Mean  Log  Distance  score  on  Target  Tracking  Test  2.  There  were  three 
statistically  significant  findings  for  time.  Indicating  that  scores  did  change 
with  a  second  testing,  whether  or  not  practice  trials  intervened  between  the 
two  tests.  Finally,  the  Omega  squared  value  Indicates  that  relatively  small 
amounts  of  test  score  variance  are  accounted  for  by  the  Group,  Time,  nr  Time 
by  Group  factors. 

These  data  suggest  that  the  practice  Intervention  was  not  a  particu¬ 
larly  strong  one.  The  average  gain  score  for  the  two  groups  across  the  five 
dependent  measures  was  only  .09  standard  deviation. 
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T«b1«  2-6 

Efftcts  of  Practico  on  Saltcttd  Co«putor  Tost  Scoros 


liil 

Dependent 

HMtyrt 

Source  of 
Ysr.lADSI 

Oewga 

SflBtrirt 

Choice  Reaction 
Time 

Trimmed  Mean 
Reaction  Time 

Group 

Time 

Time  X  Group 

1,180 

1,180 

1,160 

9.7r 

25.70* 

.73 

.032 

.035 

•  *  ti 

Target  Tracking  1 

Mean  Log  Distance 

Group 

Time 

Time  x  Group 

1,178 

1,178 

1,176 

.73 

9.26* 

4.11 

»  •  • 

.005 

Target  Tracking  2 

Mean  Log  Distance 

Group 

Time 

Time  X  Group 

1,178 

1,178 

1,178 

.47 

1.30 

7.79* 

•  •• 

.005 

Cannon  Shoot 

Time  Error 

Group 

Time 

Time  X  Group 

1,171 

1,171 

1,171 

3.79 

.16 

5.72 

mmm 

mmm 

•  •  • 

Target  Shoot 

Mean  Log  Distance  Group 

Time 

Time  x  Group 

1,171 

1,171 

1,171 

.41 

9^28* 

.08 

•  •  • 

.012 

•  •  • 

*Dtnotts  significanco  at  £  <  .01. 


Field  Tost  Results  for  tt]o  N9n-C9an1tiva 
Measures  (ABLT 


Pata 

The  Fort  Knox  data  were  used  to  obtain  descriptive  scale  statistics  and 
examine  the  covariation  among  scales.  Summary  statistics  for  the  ABLE  and 
AVOICE  are  presented  In  Tables  2-7  (ABLE)  and  2-6  (AVOICE).  The  median  alpha 
coefficient  (internal  consistency)  for  the  ABLE  content  scales  Is  .84,  and  the 
median  test-retest  (2-week  Interval)  correlation  is  .79,  with  a  range  of  .68 
to  .63.  The  median  alpha  coefficient  for  the  AVOICE  scales  is  .86,  and  the 
median  test-retest  correlation  Is  .76. 

Fakabllitv  Analyses 

To  investigate  intentional  distortion  of  responses,  data  were  gathered 
(a)  from  soldiers  instructed,  at  different  times,  to  distort  their  responses 
or  to  be  honest  (experimental  data  gathered  at  Fort  Bragg);  (b)  from  soldiers 
who  were  simply  responding  to  the  ABLE  and  AVOICE  with  no  particular  direc¬ 
tions  (data  gathered  at  Fort  Knox);  and  (c)  from  recently  sworn-in  Army 
recruits  at  the  Minneapolis  HEPS. 
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Table  2>7 

Aft£  Scale  Score  Characteristics:  Fort  Knox  Field  Test  (■  276  except  nhere  otherwise  noted) 
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TAle  2-8 

AVOICE  Scale  Score  Chorecteristics:  Fort  Uax  Field  Test  (■  -  270  exoept  ehere  otkenHse  wted) 
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*0*127  for  test-retest  correlations.  Test-retest  interval  was  2  weeks. 


The  purposes  of  the  faking  study  werr  to: 

Determine  the  extent  to  which  soldiers  can  distort  their  responses 
to  temperament  and  Interest  Inventories  when  Instructed  to  do  so. 
(Compare  data  from  Fort  Bragg  faking  conditions  with  Fort  Bragg 
and  Fort  Knox  honest  condlticis.) 

-  Determine  the  extent  to  which  the  ABIE  response  validity  scales 
detect  such  Intentional  distortion.  (Compare  response  validity 
scales  in  Fort  Bragg  honest  and  faking  condltlonSi) 

-  fJatermine  the  extent  to  which  ABLE  validity  scales  can  be  used  to 
correct  or  adjust  scores  for  Intentional  distortion. 

-  Determine  the  extent  to  which  distortion  Is  a  problem  In  an 
applicant  setting.  (Compare  HEPS  data  with  Fort  Bragg  and  Fort 
Knox  data.) 

The  participants  In  the  experimental  group  were  425  enlisted  soldiers  In 
the  62nd  Airborne  Brigade  at  Fort  Bragg.  Comparison  samples  were  MEPS 
candidates  (N  ■  126)  and  the  Fort  Knox  soldiers  described  earlier  (N  ■  276). 

Four  faking  and  two  honest  conditions  ware  created: 

ABLE  •  Fake  Good 

Imagine  youaro  at  the  Hllltary  Entrance  Processing 

‘o’  * 

Describe  yourself  In  a  way  that’you  think  will  ensure 
that  the  Army  selects  you. 

ABLE  •  Fake  Bad 

Imagine  you  are  at  the  Military  Entrance  Processing 
Station  (MEPS)  and  you  do  ijfit  want  to  Join  the  Army. 

Describe  yourself  In  a  way  that  you  think  will  ensure 
that  the  Army  does  not  seuct  you. 


ABLE  -  Honest 


You  are  to  describe  yourself  as  you  really  ere. 
AVOICE  -  Fake  Combat 

Imagine  you  are  at  the  Military  Entrance  Processing 
Station  (MEPS).  Please  describe  yourself  In  a  way 
that  you  think  will  ensure  that  you  are  placed  In  an 
occupation  In  which  you  are  likely  to  be  exposed  to 
combat  during  e  wartime  situation. 

AVOICE  •  Fake  Non*combet 


Imagine  you  are  et  the  Military  Entrance  Processing 
Station  (MEPS).  Please  describe  yourself  In  a  way  you 
think  will  ensure  that  you  are  p'aced  In  an  occupation 


ensure  that  you  are  p'aced  In  an  occupation 
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In  which  you  are  unlikely  to  be  exposed  to  combat 
during  a  wartime  situation. 

AVOICE  -  Honest 

You  are  to  describe  yourself  as  you  really  are. 

The  design  was  a  2x2x2  with  repeated  measures  on  faking  and  honest 
conditions  which  were  counterbalanced.  Thut«  approximately  half  the  experi¬ 
mental  group,  124  soldiers,  completed  the  Inventories  honestly  In  the  morning 
and  faked  In  the  afternoon,  while  the  other  half  (121)  completed  the  Inven¬ 
tories  honestly  in  the  afternoon  and  faked  In  the  morning.  The  first  between- 
subjects  factor  consisted  of  these  two  levels:  Fake  Good/Want  Combat  and  Fake 
Bad/Do  Not  Want  Combat.  Order  was  manipulated  In  the  second  between-subjects 
factor  such  that  the  following  two.  levels  were  produced:  Faked  responses  then 
honest  responses,  and  honest  responses  then  faked  responses. 

Overall,  the  ABLE  data  supported  the  following  conclusions: 

•  Soldiers  can  distort  their  responses  when  Instructed  to  do  so. 

-  The  response  validity  scales  detect  Intentional  faking. 

-  An  Individual's  Social  Desirability  scale  score  can  be  used  to  adjust 
his  or  her  content  scale  scores  to  reduce  variance  associated  with 
faking. 

-  Faking  or  distortion  may  not  be  a  significant  problem  in  an  applicant 
setting. 

Overall,  the  AVOICE  data  showed  the  following: 

-  Soldiers  can  distort  their  responses  when  Instructed  to  do  so. 

-  The  ABLE  Social  Desirability  and  Poor  Impression  scales  are  not  as 
effective  for  adjusting  AVOICE  scale  scores  as  they  are  for  adjusting 
ABLE  content  scale  scores. 

-  Faking  or  distortion  may  not  be  a  significant  problem  In  an  appl  leant 
setting. 


TRANSFORMING  THE  PILOT  'RIAL  BATTERY  INTO  THE  TRIAL  BATTERY 

In  the  field  tests  the  entire  Pilot  Trial  Battery  required  approximately 
6.5  hours  of  administration  tline.  However,  the  Trial  Battery,  which  was  the 
label  reserved  for  the  predictor  battery  to  be  used  In  the  full-scale  Concur¬ 
rent  Validation,  had  to  fit  in  a  A-hour  time  slot. 

Using  all  the  accumulated  Information,  final  revisions  were  made  during 
a  series  of  meetings  attended  by  the  project  staff  and  by  the  Scientific 
Advisory  Group.  The  revisions  and  the  stated  reasons  for  their  adoption  arc 
summarized  below. 
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Chances  to  Cognitive  PaDer-ftnd«Pencn  Tests 

Changes  to  the  cognitive  paper-and-penc11  tests  are  suimarized  in 
Table  2>9. 

The  Spatial  Visualization  construct  was  neasured  by  three  tests: 
Assembling  Objects,  Object  Rotation,  and  Shapes.  The  Shapes  Test  was  dropped 
because  the  previous  evidence  of  validity  for  predicting  job  perfoimance  was 
judged  to  be  less  Impressive  than  for  the  other  two  tests.  Eight  items  were 
dropped  from  the  Assembling  Objects  Test  by  eliminating  items  that  were  very 
difficult  or  very  easy,  or  had  low  item-total  correlations.  The  time  limit 
was  not  changed,  which  made  it  more  a  power  test  than  before. 

For  the  Spatial  Scanning  construct,  the  Path  Test  was  dropped  and  the 
Hazes  Test  was  retained  with  no  changes.  Hazes  was  a  shorter  test,  showed 
higher  teit-retest  reliabilities  (.71  vs.  .64),  and  gain  scores  were  lower 
(.24  vs.  .62  SO  unit). 


Table  2-9 

Summary  of  Changes  to  Paper-and-Pencil  Cognitive  Measures  in  the 
Pilot  Trial  Battery 


Ifiit.Hflme 

CtiflPflffs 

Assembling  Objects 

Decrease  from  40  to  32  items. 

Object  Rotation 

Retain  as  is  with  90  items. 

Shapes 

Drop  test. 

Hazes 

Retain  as  is  with  24  items. 

Path 

Drop  Test. 

Reasoning  1 

Retain  as  is  with  30  items. 
New  name:  REASONING  TEST. 

Reasoning  2 

Drop  Test. 

Orientation  1 

Drop  Test. 

Orientation  2 

Retain  as  is  with  24  items. 
New  name:  ORIENTATION  TEST. 

Orientation  3 

Retain  as  is  with  20  items. 
New  name:  MAP  TEST. 

Reasoning  Test  1  was  evaluated  as  the  better  of  the  two  tests  for 
Figural  Reasoning  because  it  had  higher  reliabilities  as  well  as  a  higher 
uniqueness  estimate.  It  was  retained  with  no  Item  or  time  limit  changes,  and 
Reasoning  Test  2  was  dropped. 

Of  the  three  tests  that  measured  the  Spatial  Orientation  construct, 
Orientation  Test  1  was  dropped  because  it  showed  lower  test-retest  reliabili¬ 
ties  (.67  vs.  .80  and  .64}  and  higher  gain  scores  (.63  SO  unit  vs.  .11  and  .08 
SO  unu). 


Changes  to  Computer-Administered  Tests 

Besides  the  changes  made  to  specific  tests,  several  Improvements  were 
made  to  the  computer  battery  as  a  whole.  The  general  changes  designed  to  save 
time,  were  as  foilowsi 

-  Most  Instructions  were  shortened  considerably. 

-  Whenever  the  practice  Items  had  a  correct  response,  the  subject  was 
given  feedback. 


-  Rest  periods  were  eliminated.  This  was  possible  because  virtually 
every  test  was  shortened. 

-  The  total  time  allowed  for  subjects  to  respond  to  a  test  Item  (I.e., 
response  time  limit)  was  set  at  9.0  seconds  for  all  reaction  time 
tests. 

Changes  to  the  Individual  computer-administered  tests  are  summarized  In 
Table  2-10. 

Fifteen  Items  were  added  to  Choice  Reaction  Time  In  an  attempt  to 
Increase  the  test-retest  reliability  for  mean  reaction  time. 

Twelve  Items  were  eliminated  from  the  Perceptual  Speed  and  Accuracy  Test 
(reduced  from  48  to  36  Items),  primarily  to  save  time.  Reduction  In  the 
number  of  Items  did  not  seem  to  be  cause  for  reliability  concerns. 

Several  changes  were  made  to  the  Target  Identification  Test.  First,  the 
“moving"  Items  were  eliminated;  field  test  data  showed  that  scores  on  the 
“moving"  and  stationary  Items  correlated  .78,  and  the  moving  Items  had  lower 
test-retest  reliabilities  than  stationary  Items  (.54  vs.  .74).  All  target 
objects  were  made  the  same  size  since  field  test  analyses  Indicated  size  had 
no  appreciable  effect  on  reaction  time.  A  third  level  of  angular  rotation  was 
added  so  that  the  target  objects  were  rotated  either  0*,  45*.  or  75*.  Finally, 
the  number  of  Items  was  reduced  from  48  to  36  to  save  time;  Internal  consis¬ 
tency  and  test-retest  estimates  Indicated  that  the  level  of  risk  attached  to 
this  reduction  was  acceptable. 

Analyses  of  field  test  data  showed  the  probe  delay  period  difference  did 
not  significantly  affect  mean  reaction  time  scores,  so  It  was  eliminated  from 
the  Short-Term  Memory  Test.  To  save  time,  12  Items  were  eliminated. 
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Table  2-10 


Suuniary  of  Changes  to  Computer-Administered  Measures  In  the 
Pilot  Trial  Battery 


Test  Name 

Changes 

COGNITIVE/PERCEPTUAL  TESTS 

Demographics 

Eliminate  race,  age,  and  typing  experience 
items.  Retain  SSN  and  video  experience 
items. 

Simple  Reaction  Time 

No  changes. 

Choice  Reaction  Time 

Increase  number  of  items  from  IS  to  30. 

Perceptual  Speed 

Reduce  items  from  48  to  36.  Eliminate 
word  &  Accuracy  items. 

Target  Identification 

Reduce  items  from  48  to  35.  Eliminate 
moving  items.  Allow  stimuli  to  appear  at 
more  angles  of  rotation. 

Short-Term  Memory 

Reduce  items  from  48  to  36.  Establish  a 
single  item  presentation  and  probe  delay 
period. 

Cannon  Shoot 

Reduce  items  from  48  to  36. 

Number  Memory 

Reduce  items  from  27  to  18.  Shorten  item 
strings.  Eliminate  item  part  delay 
periods. 

PSYCHOMOTOR  TESTS 

Target  Tracking  1 

Reduce  items  from  27  to  18.  Increase  item 
difficulty. 

Target  Tracking  2 

Reduce  items  from  27  to  18.  Increase  item 
difficulty. 

Target  Shoot 

Reduce  items  from  40  to  30  by  eliminating 
the  extremely  easy  and  extremely  difficult 
items. 

47 


The  number  of  items  on  the  Cannon  Shoot  Test  was  reduced  from  48  to  36. 
Reliabilities  for  the  time  error  scores  were  high  enough  to  warranc  such 
reductici  without  the  expectation  of  a  significant  impact  on  reliability. 

Two  modifications  were  made  to  Number  Memory  to  reduce  test  administra* 
tion  time.  The  item  delay  period  was  made  a  constant  (1  second)  rather  than 
treated  as  a  parameter  with  two  levels  (0.5  and  2.5  seconds),  and  the  item 
string  length  (number  of  parts  in  an  item)  was  changed  from  4,  6,  or  8  parts 
to  2,  3,  or  4  parts.  These  c!:anges  drastically  reduced  the  time  required  to 
complete  the  test. 

Similcr  kinds  of  changes  were  made  to  Target  Tracking  Tests  1  and  2. 
Since  internal  consistency  and  test-retest  reliability  estiii.jtes  were 
relatively  high,  the  number  of  items  was  reduced  from  27  to  18. 

Several  changes  were  made  to  the  Target  Shoot  Test.  First,  all  test 
items  were  classifUd  according  to  three  parameters:  crosshairs  speed,  ratio 
of  target  to  crosshairs  speed,  and  item  complexity  (i.e.,  number  cf  turns/mean 
segment  length).  Then,  items  were  revised  to  achieve  a  balanced  number  of 
items  in  each  cell  when  the  levels  of  these  parameters  were  crossed.  Second, 
extremely  difficult  items  were  eliminated  and  item  presentation  times  (the 
time  the  target  was  visible  on  the  screen)  were  increased  to  a  minimum  of  6 
seconds  (and  a  maximum  of  10  seconds).  This  was  done  to  eliminate  a  severe 
missing  data  problem  for  such  items  which  seemed  to  occur  when  the  target 
moved  very  rapidly,  made  many  sudden  changes  in  direction  and  speed,  or  was 
shown  only  a  few  seconds.  The  number  of  items  was  reduced  from  40  to  30  to 
save  testing  time. 

Changes  to  Non-Coonitive  Measures  (ABLE  and  AVOtCEl 

Changes  to  the  non-cognitive  measures  (ABLE  and  AVOICE)  are  uj  ..larized 
in  Table  2-11.  Time  constraints  required  a  26  percent  reduction  in  the  total 
number  of  ABLE  and  AVOICE  items.  The  goal  was  to  derr'‘ase  items  o.i  a  scale- 
by-sca’e  basis,  while  preserving  the  basic  content  of  each  scale.  A  decision 
was  also  made  to  delete  the  Agriculture  scale,  the  six  single-item  Holland 
scales,  and  the  eight  Expressed  Interest  items. 
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T«b1«  2-11 


SuoMry  of  Chongts  to  Pilot  Trial  Battory  Varslons  of  Assatsownt  of 
Background  and  Lift  Exptrloncts  (ABLE)  and  Anpy  Vocational  Xntorost 
Caraor  Examination  (AVOZCE) 


ABLE-Total 

AVOICE-Total 

AVOICE  Expressed  Interest  Scales 
AVOICE  Single  Item  Holland  Scales 
AVOICE  Agriculture  Scale 
Work  Environment  Preference  Scales 


Chanafii 

Decrease  from  270  to  approximately 
209  items. 

Decrease  from  309  to  approximately 
214  items. 

Drop. 

Drop. 

Drop. 

Move  to  criterion  measure  booklet 
(Delete  from  AVOICE  booklet). 


T.hs  IrjaL£?.tt£a 

The  final  array  of  tests  for  the  Trial  Battery  is  shown  in  Table  2-12. 
The  Trial  Battery  was  designed  to  be  administered  in  a  period  of  4  hours 
during  the  Concurrent  Validation  phase  of  Project  A,  in  which  data  collection 
beaan  in  FY8S.  Data  collected  in  that  phase  would  allow  the  first  lock  at  the 
validity  of  Trial  Battery  measures  against  job  performance  criteria. 
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Tabu  2-12 

Oascriptlon  of  Naasures  In  iha  Trial  Battary 


Cognitive  Paper-and-PencIl  Tests 

Number  of  Items 

Time  Limit 
^minutes) 

Reasoning  Test 

30 

12 

Object  Rotation  Test 

90 

7.5 

Orientation  Test 

24 

10 

Maze  Test 

24 

5.5 

Map  Test 

20 

12 

Assembling  Objects  Test 

32 

16 

Computer-Administered  Tests 

Number  of  Items 

Approximate  Time 

Demographics 

2 

4 

Reaction  Time  1 

15 

2 

Reaction  Time  2 

30 

3 

Memory  Test 

36 

7 

Target  Tracking  Test  1 

18 

6 

Perceptual  Speed  and  Accuracy  Test 

36 

6 

Target  Tracking  Test  2 

18 

7 

Number  Memory  Test 

28 

10 

Cannon  Shoot  Test 

35 

7 

Target  Identification  Test 

36 

4 

Target  Shoot  Test 

30 

5 

Non-Cognitive  Paper-and-Pencil 

Inventories 

Number  of  Items 

Approximate  Tune 

Assessment  of  Background  and  Life 

205 

35 

Experiences  (ABLE) 

Army  Vocational  Interest  Career 

176 

20 

Examination  (AVOICE) 

Chapttr  3 

CRITERION  DEVELOPMENT 


INTRODUCTION 

The  overall  goals  of  masurlng  training  and  job  performance— that  Is, 
criteria— In  Project  A  were  to  define  the  total  domain  of  performance  In  some 
reasonable  way  and  then  develop  reliable  and  valid  measures  of  each  major 
factor.  The  specific  measures  were  used  as  criteria  against  which  to  validate 
selection  and  classification  tests  and  were  not  at  the  outset  Intended  to 
serve  as  operational  methods  for  appraising  performance.  The  research  par* 
ttcipar.ts  were  Informed  that  the  measures  would  not  be  entered  into  their 
personnel  file. 


The  Developmental  Approach 

The  general  procedure  for  criterion  dovelopment  In  Project  A  followed  a 
basic  cycle  of  a  comprehensive  literature  review,  conceptual  development, 
scale  construction,  pilot  testing,  scale  revision,  field  testing,  and  pro¬ 
ponent  (management)  review.  The  specific  measurement  goals  were  to: 

e  Hake  a  state-of-the-art  attempt  to  develop  job  sample 
or  “hands-on"  measures  of  job  task  proficiency. 

e  Compare  hands-on  measurement  to  paper-and-pencil  tests 
and  rating  measures  of  proficiency  on  the  same  tasks 
(i.e.,  a  muUitrait,  multimethod  approach). 

e  Develop  rating  scale  measures  of  performance  factors 
that  are  common  to  all  first-tour  enlisted  HOS  (Army¬ 
wide  measures),  as  well  as  for  factors  that  are  specific 
to  each  HOS. 

e  Develop  standardized  measures  of  traitting  achievement 
for  the  purpose  of  determining  the  relationship  between 
training  performance  and  job  performance. 

e  Exploit  existing  f lle/administrative  data  as  much  as 
possible  for  indicators  of  Individual  performance. 

e  Use  the  data  from  the  Concurrent  Validation  sample 
to  develop  a  model  of  the  latent  structure  of  job 
performance  in  first-tour  enlisted  HOS. 

Given  these  Intentions,  the  criterion  development  affort  focused  on 
three  major  aathods  of  measuring  performance:  hands-on  job  sample  tests, 
multiple-choice  knowledge  tests,  and  ratings.  The  behaviorally  anchored 
rating  scale  (BARS)  procedure  was  extensively  usod  in  developing  the  rating 
scales. 
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The  Modeling  cf  Performance 


The  criterion  development  efforts  to  be  described  were  guided  by  a 
particular  theory  of  performance.  The  Intent  was  to  proceed  through  an  almost 
continual  process  of  data  collection,  expert  review,  and  nodel/theory 
revision. 

Multidimensional Itv 

As  a  basic  concept,  job  performance  was  viewed  as  multidimensional. 

There  Is  not  one  attribute,  one  outcome,  one  factor,  or  one  anything  that  can 
be  pointed  to  and  labeled  as  job  performance.  Further,  job  performance  was 
given  the  status  of  a  construct  (which  Implies  a  '•theory"  of  performance),  and 
IS  manifested  by  a  wide  variety  of  behaviors,  or  things  people  do,  that  are 
judged  to  be  Important  for  accompllshina  the  goals  of  the  oroanlzatlon.  For 
example,  a  manager  could  make  contributions  to  organizational  goals  by  working 
out  congruent  short-term  goals  for  his  subordinates,  and  thereby  guiding  them 
In  the  right  direction,  or  by  praising  them  for  a  job  wall  done,  and  thereby 
Increasing  subsequent  effort  levels.  Each  of  these  activities  probably 
requires  different  knowledges  and  skills,  which  are  In  turn  most  likely  a 
function  of  different  abilities. 

Consequently,  for  any  particular  job,  one  fundamental  task  of  perfor¬ 
mance  measurement  Is  to  describe  the  basic  factors  that  comprise  performance. 
That  Is,  how  many  such  factors  are  there  and  what  Is  their  basic  nature? 

Two  General  Tvoes  of  Factors 

For  the  population  of  entry-level  enlisted  positions  In  the  Army,  there 
should  be  two  major  types  of  job  performance  factor; t  components  that  reflect 
KOS-specIfIc  technical  competence  or  specific  job  behaviors  that  are  not 
required  for  other  jobs,  and  components  that  are  defined  and  measured  In  the 
sane  way  for  every  Job.  The  latter  have  been  referred  to  as  "Army-wide" 
criterion  factors,  such  as  performance  on  the  common  tasks  for  which  every 
soldier  Is  responsible. 

The  Army-wide  concept  Incorporates  the  basic  notion  that  total  perfor¬ 
mance  Is  much  more  than  task  or  technical  proficiency.  It  might  Include  such 
things  as  contribution  to  teamwork,  continual  self-development,  support  for 
the  rorras  and  customs  of  the  organization,  and  perseverance  In  the  face  of 
adversity.  A  much  more  detailed  description  of  the  Initial  working  model  for 
the  Army-wide  segment  of  performance  can  be  found  In  Borman,  Motowldlo,  Rose, 
and  Hanser  (1986). 

In  sum,  the  working  model  of  total  performance  with  which  the  project 
began  viewed  performance  as  multidimensional  within  the  two  broad  categories 
of  focters  or  constructs.  The  job  analysis  and  criterion  construction  methods 
were  designed  to  describe  the  content  of  these  factors  via  an  extensive 
description  of  the  total  performance  domain,  several  Iterations  of  data 
collections,  and  the  use  of  multiple  methods  for  Identifying  basic  performance 
factors. 
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Factors  Versus  a  Composite 

Saylno  that  ptrformanct  Is  multidimensional  does  not  preclude  using  Just 
one  index  or  an  individual's  contributions  to  make  a  specific  personnel 
decision  (e.a.,  select/not  select,  promote/not  promote).  As  argued  by  Schmidt 
and  Kaplan  (1971)  some  years  aao,  it  seems  quite  reasonable  for  the  organiza¬ 
tion  to  iSAift  the  importance  of  each  major  performance  factor  relative  to  a 

?  articular  personnel  decision  that  must  be  made,  and  to  combine  the  weighted 
actor  scores  into  a  composite  that  represents  the  total  contribution  or 
utility  of  an  individual's  performance,  within  the  context  of  that  decision. 
That  is,  the  way  in  which  performance  information  is  weighted  is  a  value 
Judgment  on  the  organization's  part.  The  deter-mi nation  of  the  specific 
combinational  rules  (e.g.,  simple  sum,  weighted  sum,  nonlinear  combination) 
that  best  reflect  what  the  organization  is  trying  to  accomplish  is  in  large 
measure  a  research  question. 

AJSlrurturiT  Madil 

If  perfortsance  is  characterized  in  the  above  manner,  then  a  more  formal 
way  to  model  performance  is  to  think  in  terms  of  its  latent  structure.  The 
usual  common  factor  model  of  the  latent  structure  is  open  to  criticism  because 
all  of  the  criterion  (i.e.,  performance)  measures  may  not  be  at  the  same  level 
of  explanation,  or  they  may  be  so  qualitatively  different  that  putting  them 
into  the  same  correlation  matrix  does  not  seem  appropriate,  or  two  crueria 
may  not  be  functionally  independent.  One  might  be  a  cause  of  another;  for 
example,  individual  differences  in  training  performance  may  be  a  cause  of 
individual  differences  in  Job  performance. 

From  this  perspective,  the  aims  of  criterion  analysis  are  to  use  all 
available  evidence,  theory,  and  professional  Judgment  to  (a)  identify  the 
variables  that  are  necessary  and  sufficient  to  explain  the  phenomena  of 
interest,  and  (b)  specify  the  nature  of  the  relationships  between  pairs  of 
variables  in  terms  of  whether  they  1)  are  correlated  because  one  is  a  cause 
of  another,  2)  are  correlated  because  both  are  manifestations  of  the  same 
latent  property,  or  3)  are  independent.  The  more  explicitly  the  causal 
directions  and  the  predicted  magnitude  of  the  associations  can  be  specified, 
the  greater  the  potential  power  of  the  model  if  it  is  confirmed  by  subsequent 
empirical  data. 

Within  the  structural  equation  framework  there  are  manifest  variables 
(operational  measuras)  and  latent  variables  (constructs).  The  Project  A 
proposal  and  research  plan  dealt  explicitly  with  criterion  constructs  and 
criterion  measures. 

A  few  points  should  be  made  about  this  view.  First,  a  lot  more  is  known 
about  predictor  (i.e.,  ability,  temperament,  and  interest)  constructs  than 
about  Job  performance  constructs.  There  are  volumes  of  research  on  the 
former,  and  almost  none  on  the  latter.  Relatively  little  attention  has  been 
given  to  conceptualizing  performance  in  clerical,  technical,  or  skilled  Jobs. 

Second,  the  usual  textbook  illustration  of  a  latent  structural  equation 
model  (e.g.,  James,  Muliak,  &  Brett,  1982)  shows  each  latent  variable  being 
represented  by  one  or  more  manifest  operational  measures.  However,  Just  as  it 
is  easy  to  think  of  examples  where  a  predictor  test  score  could  be  a  function 
of  more  than  one  latent  variable  (e.g,,  the  score  on  computerized  two-hand 
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tracking  apparatus  could  be  a  function  of  several  latent  psychomotor 
"factors'*},  the  same  will  be  true  of  criterion  measures.  Most  of  them  will 
not  be  unidimenslonal. 

Third,  we  would  be  hard-pressed  to  defend  placing  the  criterion 
variables  on  some  continuum  from  Immediate  to  Intermediate  to  ultimate  as  a 
meant  for  portraying  their  relative  Importance  or  their  functional  Inter¬ 
relationships. 

Finally,  people  do  not  usually  work  alone.  Individuals  are  members  of 
work  groups  or  units  and  It  Is  the  unit's  performance  that  frequently  Is  the 
most  central  concern  of  the  organization.  However,  determining  the  Individ¬ 
ual's  contribution  to  the  unit's  score  Is  not  a  simple  problem.  Further, 
variation  In  unit  performance  Is  most  likely  a  function  of  a  number  of  factors 
besides  the  "true"  level  of  performance  of  each  Individual.  The  quality  of 
leadership,  weather  conditions,  or  the  availability  of  spare  parts  are 
examples  of  such  additional  sources  of  variation  In  unit  performance. 

In  sum,  Project  A  researchers  attempted,  In  state-of-the-art  fashion,  to 
develop  both  a  theory  of  entry-level  performance  In  skilled  Jobs  (I.e.,  as 
represented  by  the  population  of  Army  NOS)  and  to  construct  multiple  valid  and 
reliable  measures  of  each  major  performance  component.  In  large  measure,  the 
project  was  successful  In  doing  so  and  has  now  gone  far  beyond  any  previous 
efforts  to  account  for  the  totality  of  Job  performance. 


CRITERION  DEVELOPMENTS  MOS-SPECIFIC  TASK-BASED  PERFORMANCE  MEASURES 

The  task  analysis-based,  MOS-specIfIc  criterion  measures  concern  the 
assessment  of  performance  on  a  sample  of  tasks  for  a  particular  MOS  that  were 
Identified  as  representative  of  all  tasks  In  that  MOS.  The  general  procedure 
was  to  develop  a  careful  description  of  all  the  major  tasks  that  comprise  the 
Job  (I.e.,  the  total  population  or  domain  of  tasks),  draw  a  sample  of  these 
tasks,  and  develop  multiple  measures  of  performance  on  each  task.  (See 
Campbell,  C.  H.,  Campbell,  R.  C.,  Rumsey,  &  Edwards,  1966.) 

The  total  number  of  tasks  to  be  sampled  was  dictated  primarily  by  time 
constraints.  While  the  time  required  to  assess  performance  on  Individual 
tasks  would  differ  by  task,  a  total  of  30  tasks  for  each  MOS  was  taken  as  a 
reasonable  planning  figure. 

For  each  HOS,  all  30  tasks  would  be  assessed  with  written  knowledge 
tests.  Fifteen  of  the  30  tests  would  also  be  assessed  with  hands-on  tests. 
Finally,  task  performance  ratings  would  be  obtained  for  the  15  tasks  measured 
with  the  hands-on  Job  sample  tests,  and  Job  history  data  covering  recency  and 
frequency  of  performance  would  be  obtained  for  all  30  tasks.  As  noted 
previously,  because  of  cost  considerations  the  MOS-specIfIc  Job  performance 
measures  (I.e.,  the  hands-on  tests  and  MOS-specIfIc  ratings)  could  be  devel¬ 
oped  for  only  nine  of  the  19  original  MOS  In  the  sample.  The  nine  were 
further  divided  Into  two  groups  known  as  Batch  A  and  Batch  B.  The  MOS  In 
Batch  A  were  done  first;  sometimes  during  the  development  period  the  lessons 
learned  1n  Batch  A  led  to  changes  In  procedures  for  Batch  B.  The  remaining  10 
MOS  became  known  as  Batch  Z.  The  compositions  of  Batches  A,  B,  and  Z  are 
shown  In  Table  3-1. 
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T«b1«  3-1 

NOS  Grouping  for  Criterion  Dovolopmnt 


MQS 


Batch  A  13B  Cannon  CroMinan 

64C  Motor  Transport  Operator 
71L  Administrative  Specialist 
95B  Military  Police 


Batch  B  IIB  Infantryman 

19E  Armor  Crewman 
31C  Radio  Teletype  Operator 
63B  Light  Wheel  Vehicle  Mechanic 
91A  Medical  Specialist 


Batch  Z  12B  Combat  Engineer 

I6S  MANPAOS  Crewman 

27E  TOW/Dragon  Repairer 

SIB  Carpentry/Masonry  Specialist 

54E  Chemical  Operations  Specialist 

55B  Ammunition  Specialist 

67N  Utility  Helicopter  Repairer 

76W  Petroleum  Supply  Specialist 

76Y  Unit  Supply  Specialist 

94B  Food  Service  Specialist 


Defining  the  Task  Domain 

Enumerating  the  total  task  domain  for  an  MOS  was  based  on  three  primary 
sources: 


MOS-SoecIflc  Soldier/ s  Manuals  (SMl.  Each  MOS  Proponent,  the  agency 
ible  for  prescribing  MOS  policy  and  doctrine,  prepares  and  publishes  a 


res pons 
Soldier's  Manual 


ne,  pr( 

that  lists  and  describes,  tasks,  by  skll'l  level,  that  soldiers 
In  the  MOS  are  doctrinally  responsible  for  knowing  and  performing.  The  number 
of  tasks  per  MOS  varies  widely  from  a  low  of  17  Skill  Level  1  (SLl)  tasks  to 
more  than  130  SLl  tasks. 


Soldier^  Manual  of  Common  Tasks  (SMCTl.  The  SMCT  describes  tasks  that 
each  soldier  In  the  Army,  regardless  of  his  or  her  MOS,  must  be  able  to 
perform.  The  1983  version  contained  78  SLl  tasks. 


55 


Armv.OccuDational  Survey  Program  (AOSP).  The  AOSP  obtains  task  descrip¬ 
tions  by  surveying  job  Incumbents  with  a  questionnaire  checklist  that 
Includes  several  hundred  items.  The  Items  are  obtained  from  a  variety  of 
sources  (e.g.,  the  Proponent  school),  and  Include  and  expand  the  doctrinal 
tasks  from  the  preceding  two  sources.  The  AOSP  Is  administered  to  soldiers  In 
all  skill  levels  of  each  HOS  by  the  II. S.  Army  Soldier  Support  Center.  The 
analysis  of  responses  by  msfins  of  the  Comprehensive  Occupational  Data  Analysis 
Program  (CODAP)  provides  the  number  and  percentage  of  soldiers  at  each  skill 
level  who  report  that  they  perform  each  task.  The  number  of  activities  In  the 
AOSP  surveys  for  the  nliie  MOS  of  Interest  ranged  from  546  to  well  over  BOO. 

Proponent  agencies  were  also  contacted  directly  to  determine  whether 
relevant  tasks  existed  beyond  those  listed  In  the  three  primary  sources.  The 
number  of  additional  tasks  thus  generated  was  not  large,  but  the  tasks  were 
sometimes  significant.  For  example,  the  Introduction  of  new  equipment  added 
tasks  that  had  not  yat  appeared  In  the  written  documentation. 

The  preliminary  aggregate  list  of  SM/SMCT  tasks  and  AOSP  statements  was 
carefully  edited  for  redundancies,  and  Items  were  revised  and  combined  to 
achieve  a  relatively  unlfcrir.  level  nf  generality  and  format  across  Items.  The 
result  was  a  refined  list  of  MOS  tasks  used  as  a  basis  for  domain  review  and 
consolidation. 

At  each  Proponent  a  minimum  of  three  senior  NCOs  or  officers  reviewed 
the  refined  list  for  an  MOS.  These  subject  matter  experts  (SME)  eliminated 
tasks  that  had  been  Incorrectly  Included  In  the  domain,  for  reasons  such  as 
equipment  that  was  being  changed,  current  doctrine  not  yet  reflected  In 
available  publications,  and  equipment  variations  that  should  be  combined. 

In  the  final  phase,  the  task  lists  resulting  from  domain  consolidation 
were  again  reviewed  to  eliminate  tasks  that  pertained  to  restricted  duty 
positions  or  that  were  performed  only  Infrequently.  The  result  of  this 
process  was  a  final  task  list  (or  population)  for  each  MOS.  Table  3-2  shows 
the  reduction  of  the  task  list  during  each  phase  and  the  reasons  fur  the 
reduction,  by  MOS. 

SME_ Judgments  of  Task  Characteristics 

As  preparation  for  selecting  30  representative  tasks  for  each  MOS,  15-30 
SMEs  (NCOS  at  EC  or  above  and  officers  at  grade  0-3  or  above)  rated  each  task 
on  a  number  of  characteristics.  Three  types  of  Judgments  were  obtained: 

^  ,  Task  Clustering.  Each  task  was  listed  on  a  3  x  5  Inch  card  along  with  a 

brief  description. SMEs  were  told  to  sort  the  tasks  into  groups  so  that  all 
the  tasks  In  each  group  were  alike,  and  each  group  was  different  from  the 
other  groups. 

Task  Importance.  The  procedure  for  rating  task  Importance  was  different 
for  the  first  four  MOS  (Batch  A)  then  for  the  last  five  MOS  (Batch  B)  that 
were  analysed  (see  Table  3-1).  For  Batch  A,  all  SMEs  were  given  a  European 
scenario  that  specified  a  high  state  of  training  and  strategic  readiness  but 
was  short  of  Involving  actual  conflict.  After  Batch  A  data  were  collected, 
concern  was  expressed  as  to  the  scenario  effect  on  SME  judgments.  As  a 
result,  for  Batch  B  three  scenarios  were  used.  An  “Increasing  Tension*' 
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Tabic  3-2 

Effects  of  Domain  Dafinitlon  on  NOS  Task  Lists 


MOS 


13B 

iSlL 

21L 

9SB 

m 

31C 

63B 

OlA 

AUSP  Review 

AOSP  Statements 

669 

677 

822 

546 

822 

609 

656 

633 

685 

Deleted  •  Zero  Frequency 

67 

169 

329 

197 

188 

103 

134 

84 

267 

Deleted  by  SMEs 

- 

- 

58 

- 

• 

m 

- 

195 

61 

AOSP  Statements  Used 

602 

508 

435 

369 

634 

506 

522 

354 

357 

Domain  Consolidation 

Tasks  In  MOS* 

378 

166 

203 

304 

357 

338 

267 

188 

251 

Nonapp 11 cable  Systems 

- 

• 

m 

• 

50 

m 

• 

• 

Eliminated  by  Doctrine 

23 

m 

- 

16 

14 

97 

10 

12 

Collective  Tasks 

25 

- 

• 

5 

• 

• 

• 

m 

Combined  Systems 

57 

• 

- 

- 

m 

■1 

■i 

m 

Reserve  Components  Tasks 

- 

m 

• 

15 

- 

• 

m 

- 

Tasks  In  Domain 

273 

166 

203 

304 

321 

274 

170 

178 

239 

Domain 

Reduction 

Tasks  In  Domain 

273 

166 

203 

304 

321 

274 

170 

178 

239 

Restricted  Duty  Position 

44 

- 

42 

m 

- 

m 

- 

m 

- 

Preliminary  Sort 

m 

m 

- 

176 

- 

- 

- 

- 

Low  Frequency  (High  Skill  Level/ 
AOSP  Only) 

53 

47 

•• 

• 

90 

39 

• 

- 

Domain  Tasks  for  SME  Judgments 

177 

119 

161 

128 

231 

235 

170 

178 

239 

*  Task  list  resulting  from  the  merging  of  the  Soldier's  Manuals  lists  and  tho  more 
detailed  AOSP  descriptions. 
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scenario  identical  to  that  used  in  Batch  A  was  retained,  and  a  "Training* 
scenario  specifying  a  stateside  environment  and  a  "Combat"  scenario  (European 
non-nuclear)  were  developed.  The  30  SMEs  for  each  Batch  B  MOS  were  randomly 
dNided  into  three  groups  and  each  group  was  giver,  a  different  scenario  as  a 
basis  for  judgments. 

For  Batch  A  MOS,  the  judjjes  were  given  the  tasks  on  individual  cards, 
identical  to  those  used  in  task  clustering,  and  told  to  rank  the  tasks  from 
Most  Important  to  Least  Important.  For  Batch  B  MOS,  Judges  were  provided  a 
list  of  the  tasks,  with  descriptions,  and  asked  to  rate  them  on  a  7-point 
scale  from  "1  •  Not  at  all  important  for  unit  success"  to  "7  «  Absolutely 
essential  for  unit  success." 

Task  Performance  Difficulty.  To  arrive  at  an  indication  of  expected 
task  difficulty,  SMEs  were  asked  to  sort  a  "typical"  group  of  10  soldiers 
across  five  performance  levels  based  on  how  they  would  expect  a  typical  group 
of  SLl  soldiers  to  perform  on  each  task.  The  standard  deviation  of  this 
distribution  served  as  an  index  of  expected  performance  variability. 

Selection  of  Tasks  To  Be  Tested 

From  five  to  nine  project  staff,  Including  the  individual  who  had  prime 
responsibility  for  that  particular  MOS,  together  with  six  NCO/officer  SMEs, 
participated  in  the  task  selection  process  for  each  MOS.  The  selection  panel 
wcs  provided  the  data  summaries  of  the  SME  Judgments  and  asked  to  make  an 
initial  selection  of  35  tasks  to  represent  each  MOS.  No  strici:  selection 
rules  were  Imposed,  although  the  analysts  were  told  that  high  importance,  high 
performance  variability,  a  range  of  difficulty,  and  frequently  performed  tasks 
were  desirable,  and  that  each  cluster  should  be  sampled. 

The  next  phase  was  a  Delphi-type  negotiation  among  analysts  to  merge 
their  respective  choices  into  a  consensus  list  of  35  tasks  for  each  MOS. 
Information  on  the  choices  and  rationale  provided  by  each  analyst  in  the 
preceding  phase  was  distributed  to  all  analysts,  and  each  made  a  decision  to 
retain  or  adjust  his  or  her  decisions,  taking  into  account  opinions  others  had 
expressed.  For  all  MOS,  three  Uervtions  were  necessary. 

The  resulting  task  selection  lists  were  mailed  to  each  Proponent;  a 
briefing  by  Project  A  staff  was  provided  if  requested.  A  Proponent  repre¬ 
sentative  then  coerdinatod  a  review  of  the  list  by  Proponent  personnel 
designated  as  having  the  appropriate  qualifications.  A^er  some  minor 
Proponent-recommended  adjustments,  the  final  list  of  30  tasks  was  selected  for 
each  MOS. 

Asiljameot  of  Tasks  to  Test  Mode 

The  initial  development  plan  required  that  a  Job  knowledge  test  be 
developed  for  all  30  tasks,  anc  a  hands-on  test  for  1.^  of  these  tasks.  The 
considerations  that  constrained  selection  for  hands-on  testing  were: 

e  Fifteen  soldiers  must  complete  all  15  hands-on  tests  in  4  hours. 

e  Scorer  support  would  be  limited  to  eight  NCO  scorers. 
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•  The  hands-on  test  site  must  be  within  walking  distance  of  the 
other  test  activities. 

e  Equipment  requirements  must  be  kept  within  reason. 

e  The  test  must  be  administrable  In  a  number  of  Installations. 

On  the  basis  of  these  constraints,  each  of  the  five  project  analysts 
Independently  reviewed  the  available  Information  and  made  a  task  selection. 
Following  Individual  ratings,  analysts  met  In  group  discussions  and  proceeded 
task  by  task  to  resolve  differences  until  a  consensus  was  reached. 

Construction  of  Hands-On  and  Knowledge  Tests 

For  both  hands-on  and  knowledge  tests,  the  primary  source  of  test 
content  was  task  analysis  data. 

Hands-Qfi.TMt  PgYtlgpront 

The  model  for  hands-on  test  development  emphasized  four  activities: 

e  Determine  test  conditions.  Test  conditions  were  designed  to 
maximize  the  standardization  of  the  test  between  test  sites  and 
among  soldiers  at  the  same  test  site. 

a  List  performance  measures.  Performance  measures  were  defined  as 
either  product  or  process  depending  on  what  the  scorer  was 
directed  to  observe  so  as  to  score  behavior. 

e  State  examinee  InatructiQns.  Examinee  Instructions  were  read 
verbatim  to  the  soldier  and  were  the  only  verbal  communications 
the  scorer  was  allowed  to  have  with  the  examinee. 

e  Develop  scorer  Instructions.  These  Instructions  told  the  scorer 
how  to  set  up,  adminhter,  and  score  the  test. 


Job  Knowledge  Test  Development 

A  multiple-choice  format  was  selected,  and  4  hours  were  allocated  to  the 
knowledge  testing  block  for  the  field  trials,  to  be  reduced  to  2  hours  for 
Concurrent  Validation  testing.  Allowing  an  average  of  slightly  less  than  one 
minute  to  read  and  answer  one  Item  dictated  an  average  of  about  nine  Items  per 
task. 


Knowledge  test  development  was  based  on  the  same  Information  that  was 
available  for  hands-on  development  and  emphasized  performance  knowledge  by 
attempting  to  write  Items  that  were: 

e  Performance-base;}.  Such  Items  require  the  examinee  to  select  an 
answer  describing  lum  something  should  be  done.  The  goal  was  to 
avoid  a  tendency  to  cover  Informetlon  about  a  step  Is  done  or 
rely  on  technical  questions  about  the  task  or  equipment.  The 
knowledge  or  recall  required  was  not  to  exceed  what  was  required 
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when  actually  performing  the  task.  Liberal  use  of  quality 
illustrations  was  essential. 


a  Focused  on  performance  errors.  Performance-based  knowledge  tests 
must  Tocus  on  what  soldiers  do  when  they  fail  to  perform  the  task 
or  steps  in  the  task  correctly. 

Knowledge  tests  were  constructed  by  project  personnel  with  experience  in 
test  item  construction  and  expertise  in  the  MOS/task  being  tested.  Test  items 
were  reviewed  internally  by  a  panel  of  test  experts  to  insure  consistency 
among  individual  developers. 

Pilot  Testing 

Following  construction  of  the  tests,  arrangements  were  made  through  the 
Proponent  for  troop  support  for  a  pilot  test  of  the  hands-on  and  knowledge 
tests.  This  procedure  was  conducted  by  the  test  developer.  The  hands-on 
tests  involved  the  support  of  four  NCO  scorer/SMEs,  five  MOS  incumbents  in 
SLl,  and  the  equipment  dictated  by  the  test.  The  knowledge  tests  utilized  the 
same  four  NCO  hands-on  scorers  and  five  MOS  incumbents,  the  test  developer 
went  through  each  test,  item  by  item,  with  all  four  NCOS  simultaneously.  The 
five  incumbents  took  the  test  as  actual  examinees.  Revisions  were  based  on 
SMC  and  incumbent  inputs. 


Task-Specific  Performance  Rating  Scales 

Development  of  hands-on  and  knowledge  tests  provided  two  methods  of 
measuring  the  sample  of  IS  tasks.  As  a  third  method,  the  soldier's  peers  and 
supervisors  were  asked  to  rate  the  soldier's  performance  on  those  same  15 
tasks  by  means  of  a  7-point  numerical  rating  scale.  The  intent  was  to  assess 
performance  on  the  same  set  of  15  tasks  with  three  different  methods.  The 
rating  scales  were  developed  for  administration  during  the  field  tests. 

Job, History  Questionnaire 

Although  soldiers  in  a  given  MOS  share  a  common  pool  of  potential  tasks, 
their  actual  task  experience  may  vary  substantially.  To  assess  the  likely 
impact  of  experience  effects  on  task  performance,  and  consequently  on  the 
Concurrent  Validation  strateoies,  a  Job  History  Questionnaire  was  developed  to 
be  administered  to  each  soldier.  Specifically,  soldiers  were  asked  to 
indicate  how  recently  and  how  freoueptlv  (in  the  praceding  6  months)  they  had 
performed  each  of  the  30  tasks  selected  as  performance  criteria. 


lUMAcy 

At  this  point  the  initial  versions  of  the  hands-on  Job  sample  tests  and 
the  multiple-choice  knowledge  tests  had  been  developed,  pilot  tested,  and 
revised.  The  7-point  task  performance  rating  scales  and  the  Job  History 
Questionnaire  had  been  constructed. 
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CRITERION  DEVELOPMENT:  MOS-SPECinC  BEHAVIORALLY 
ANCHORED  RATING  SCALES 

A  major  component  of  Project  A  criterion  developmCint  was  devoted  to 
using  the  critical  Incident  ir^thod  to  identify  basic  performance  factors.  The 
procedure  used  to  identify  MOS-specific  performance  factors  was  derived  in 
urge  part  from  procedures  outlined  by  Smith  and  Kendall  (1963)  and  by 
Campbell,  Dunnette,  Arvey,  and  Hellervick  (1973). 

Development  Procedure 

The  general  development  procedure  involved  the  following  steps:  (a) 
conducting  workshops  to  collect  performance  incidents  for  the  assigned  MOS, 

(b)  editing  incidents,  (c)  conductina  the  retranslation  exercises,  (d) 
developing  behavioral ly  anchored  performance  rating  scales  (BARS),  and  (e) 
revising  the  scales  for  use  in  the  Concurrent  Valioation  efforts.  (See  Toquam 
et  al.,  1986.) 

Critical  Incident  Workshops 

Almost  all  participants  were  NCOs  who  were  directly  responsible  for 
supervising  first>term  enlistees  and  who  themselves  had  spent  2  to  4  years  as 
first-termers  in  these  MOS.  Workshops  for  each  MOS  were  conducted  at  six 
Continental  United  States  (CONUS)  Army  posts. 

Staff  members  first  described  Project  A  and  explained  the  purpose  of  the 
workshop.  Participants  were  then  asked  to  generate  accounts  of  Army>wide 
performance  incidents,  using  examples  provided  as  guides,  and  to  avoid 
describiiio  activities  or  behavio^-s  that  reflect  general  soldier  effective-ness 
(e.g.,  following  rules  and  regulations,  military  appearance),  as  these 
requirements  were  being  identified  and  describeo  in  another  part  of  the 
project. 

After  4-S  hours,  the  participants  were  asked  to  identify  potential  job 
performance  categories,  which  workshop  leaders  recorded  on  a  olackboard  or 
flipchart.  Following  discussion,  the  performance  incidents  written  to  that 
point  were  reviewed  and  assigned  to  one  of  the  categories  that  appeared  on  the 
blackboara  or  flipchart.  The  remaining  time  was  spent  generating  performance 
incidents  for  those  categories  that  contained  few  incidents. 

Results  from  the  performance  incident  workshops  are  reported  in 
Table  3-3  for  Batch  A  MOS  and  in  Tabla  3-4  for  Batch  B  MOS. 

Incident  RetrMLlaticn  and  Construction  of  Initial  Rating  Scales 

Evidence  that  the  performance  dimension  system  provides  a  thorough  and 
comprehensive  coverage  of  the  critical  Job  requirements  is  high  agreement 
among  judges  that  specific  incidents  represent  particular  components  (factors) 
of  performance,  that  all  hypothesized  factors  can  be  represented  by  incidents, 
and  that  all  incidents  in  t;.e  sample  can  be  assigned  to  a  factor  (if  they 
cannot,  factors  may  be  missing). 

This  retranslation  step  can  also  be  used  to  develop  the  performance 
anchors  for  each  dimension.  Participants  arc  asked  to  rate  the  level  of 
performance  described  in  the  incident. 
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BARS  PtrforMnca  Incldtnt  Workshops:  Number  of  Participants  and  Incidents 
Generated  by  HOS  and  by  Location  -  Batch  A 


lii 

MIL 

ZIL 

Total  By 

Lasaygn 

Fort  Ord 

N  -  Participants 

14 

10 

5 

14 

43 

N  -  Incidents 

194 

80 

59 

213 

547 

Mean  Per  Participant 

13.9 

8.0 

11.8 

15.2 

12.7 

K-t  Poll' 

N  ■  Participants 

12 

15 

IS 

IS 

57 

N  -  Incidents 

150 

240 

210 

235 

835 

Mean  Per  Participant 

12.5 

16.0 

14.0 

15.7 

14.7 

Fort  Bragg 

N  -  Participants 

13 

14 

11 

17 

55 

N  -  Incidents 

235 

221 

218 

225 

899 

Mean  Per  Participant 

18.1 

15.8 

19.8 

13.2 

16.4 

Fort  Campbell 

N  -  Participants 

13 

13 

10 

11 

47 

N  -  Incidents 

195 

191 

154 

238 

778 

Mean  Per  Participant 

11.5 

13.6 

17.1 

15.9 

14.2 

Fort  Hood 

N  -  Participants 

13 

13 

10 

11 

47 

N  -  Incidents 

160 

1C3 

133 

92 

586 

Mean  Per  Participant 

13.9 

14.1 

13.1 

8.4 

10.7 

Fort  Carson 

N  -  Participants 

19 

15 

13 

14 

61 

M  -  Incidents 

204 

232 

215 

180 

831 

Mean  Per  Participant 

10.7 

15.5 

16.5 

12.9 

13.6 

Total  By  MOS 

N  -  Participants 

B6 

81 

63 

86 

318 

N  -  Incidents 

1159 

1147 

989 

1183 

4478 

Mean  Per  Participant 

13.2 

14.2 

15.7 

13.8 

14.1 

6? 
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BARS  Ptrformnct  Incident  Workshops:  Number  of  Participants  and  Incidents 
Generated  by  HOS  and  by  Location  -  Batch  B 


m 


Logfltign 

I1& 

31C 

63B 

ilA 

Total  Cy 
Location 

Fort  Lewis 

K  -  Participants 

16 

11 

8 

10 

11 

56 

N  -  Incidents 

211 

180 

124 

172 

130 

817 

Mean  Per  Participant 

18.3 

16.4 

15.5 

17.2 

11.8 

14.6 

Fort  Stewart 

N  -  Participants 

14 

IS 

15 

16 

16 

76 

N  •  Incidents 

216 

275 

256 

208 

249 

1204 

Mean  Per  Participant 

15.4 

18.3 

17.1 

13.0 

15.6 

15.8 

Fort  Riley 

N  •  Participants 

18 

7 

10 

11 

8 

54 

N  <•  Incidents 

216 

123 

127 

133 

90 

689 

Mean  Per  Participant 

12.0 

17.6 

12.7 

12.1 

11.3 

13.8 

Fort  Bragg 

N  •  Participants 

13 

14 

16 

IS 

13 

71 

N  •  Incidents 

231 

190 

220 

250 

217 

1,108 

Mean  Per  Participant 

17.8 

13.6 

13.8 

16.7 

16.7 

15.6 

Fort  Sill* 

N  -  Participants 

8 

4 

3 

9 

10 

34 

N  •  Incidents 

26 

0 

13 

32 

20 

91 

Mean  Per  Participant 

3.3 

4.3 

3.6 

?..o 

2.7 

Fort  Bliss* 

N  •  Participants 

14 

14 

8 

14 

13 

63 

N  •  Incidents 

93 

70 

39 

71 

55 

328 

Mean  Per  Participant 

6.6 

5.0 

4.9 

5.1 

4.2 

5.2 

Total  By  MOS 

N  -  Participants 

83 

65 

60 

75 

71 

354 

N  -  Incidents 

993 

838 

779 

866 

761 

4,237 

Mean  Per  Participant 

12.0 

12.0' 

13.0 

11.6 

10.7 

12.0 

*  Participants  at  these  posts  spent  most  of  the  time  completing  retranslatlon  booklets 
rather  than  generating  critical  Incidents. 
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The  retranslation  data  were  analyzed  separately  for  each  MOS.  The 
process  Included  computing  for  each  Incident  (a)  the  number  of  raters, 

(b)  percentaoe  agreement  among  raters  In  asslgnlr.a  Incidents  to  performance 
dimensions,  (c)  mean  effectiveness  rating,  and  (d)  standard  deviation  of  the 
effectiveness  ratings. 

The  next  step  Involved  Identifying  those  performance  Incidents  In  which 
raters  agreed  reasonably  well  on  performance  dimension  assignment  and  effec¬ 
tiveness  level.  For  each  MOS,  performance  Incidents  were  Identified  that  met 
*he  following  criteria?  (a)  at  least  50  percent  of  the  raters  agreed  that  the 
Incident  depicted  performance  In  a  single  performance  dimension,  and  (b)  the 
standard  deviation  of  the  mean  effectiveness  rating  did  not  exceed  2.0.  These 
Incidents  were  then  sorted  Into  their  assigned  performance  dimensions. 

Results  from  this  sorting  are  presented  for  each  MOS  In  Table  3-5. 

Rtvl?  Mr  After  Rctranslatign 

The  categorization  of  the  original  critical  Incident  pool  produced  a 
total  of  93  Inulal  performance  dimensions  for  the  nine  MOS  In  Batch  A  and 
Batch  B,  with  a  range  of  7-13  dimensions  per  MOS.  Based  on  the  retranslation 
results,  a  number  of  the  original  performance  dimensions  were  redefined, 
omitted,  or  combined.  From  the  original  set,  six  were  omitted  and  four  were 
lost  through  combination.  One  of  the  omissions  was  due  to  the  fact  that  too 
few  critical  Incidents  were  retranslated  Into  It  by  the  judges.  The  other 
five  were  omitted  because  the  factor  represented  tasks  that  were  well  beyond 
Skill  Level  1  or  were  from  a  very  specialized  low-density  “track"  within  the 
MOS  (e.g.,  MOS  71L  F5-Postal  Clerk). 

After  modifying  the  dimension  system  using  results  from  the  retrans¬ 
lation  exercise,  behavioral  anchors  were  developed  for  each  dimension.  This 
Involved  sorting  effective  performance  Incidents  with  mean  values  of  6.5  or 
higher,  average  performance  with  mean  values  of  3.5  to  6.4,  and  Ineffective 
performance  with  mean  values  from  1.0  to  3.4,  and  then  summarizing  the 
Information  in  each  group  to  form  three  summary  behavioral  anchors  depicting 
effective,  average,  and  Ineffective  performance.  Traditional  behavlorally 
anchored  rating  scales  contain  specific  examples  of  Job  behaviors  for  each 
effectiveness  level  In  a  performance  dimension.  Behavioral  summary  scales,  on 
the  other  hand,  contain  anchors  that  represent  the  behavioral  content  of  all 
performance  Incidents  reliably  retranslated  for  that  particular  level  of 
effectiveness. 

After  the  performance  rating  scales  had  been  developed  for  each  MOS, 
these  were  submitted  to  Intensive  review  by  the  project  research  staff  and  the 
Scientific  Advisory  Group.  Results  from  these  reviews  were  used  to  clarify 
performance  definitions  and  behavioral  anchors. 

Field  Test  Versions  of  MQS-Soeciflc  BARS 

The  final  set  of  behavlorally  anchored  rating  scales  for  the  nine  MOS 
for  use  in  the  field  test  contained  from  6  to  12  performance  dimensions.  Each 
of  the  performance  dimensions  Includes  behavioral  anchors  describing  Ineffec¬ 
tive,  average,  and  effective  performance.  Raters  were  asked  to  use  these 
anchors  to  evaluate  ratees  on  a  scale  ranging  from  1  (ineffective  performance) 
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Ttbit  3-3 


Behavioral  Examples  Reliably  Retranslated  Into  Each  Dlemnslon 
on  the  BARS  Heasures 


HuOtr  of 
JIUBltl. 


P1»>i1on 


Suiewr  of 
EKtroln. 


Cuim 


A.  Loading  out  oqvIpMnt 

I.  Driving  and  enlnuinlng  vahiclot. 
howltiara.  and  aculpnnt 

C.  Transportlng/iortlng/itorlng  and 
sparing  anaunitlon  for  firo 
0.  nmring  for  occupation  and 
aneiaelng  how'uar 
E.  Sotting  up  comunleatlunB 
f.  Cunnory 

6.  Loadlng/unloadlno  hovltxar 
ri.  XacalvTng  and  raiaylng  coaMnleatlom  19 
t.  aocordlng/rtcard  kaoping  29 

J.  fodtlon  laprovaoant 


Motor  Tramport  Oporator  (SIC) 

A.  Driving  Vthlrlat 

B.  yahlela  coupling 

C.  QMckl.'4  and  mlntalnlng  vohleloi 

D.  Ualne  Mpa/fol lowing  papar  routai 

E.  Loading  cargo  and  traniporting 
partonnal 

F.  Hrking  and  aacurlng  vahlclai 

6.  Parforalng  idalnlitrativo  dutlaa 
M.  Salf-racovtrlng  vaiilclai 

I.  Safaty«alndodi-was 

J.  Parforalng  ditpetchar  dutlaa 


Adalnlatratlva  Special  tat  (TU.) 

A.  Preparing,  typing,  and  proofreading  les 

docunntt 

1.  Olitrlbuting  and  dlipatcliing  dS 

Inceelng/outgolng  dacinnta 

C.  Maintaining  offlea  resoureva  73 

D.  Poiting  regulations  44 

E.  EstablTihing  and/or  alntalnlng  SO 

fllei  lAU  WFS 

F.  Kaaping  roeerdt  94 

9.  Safaguardlng  and  aonltorlng  43 

sacurlty  of  class  If fad  astarlals 
P.  Providing  eustOMr  sarvlea  90 

I.  Praparlng  tpaclal  reports,  19 

doewints,  drafts,  and  other  ■atorlals 

J.  Sorting,  routing  and  dIstrllMtIna  28 

Ineoalng/outgolng  nail 

K.  Nalnuinlng  Ane'  Post  Office  2 

oquipwnt 

L.  Keeping  Pott  Offlea  records  20 

H.  Nalntalnlng  security  of  eall  9 


Military  Pellea  (Odi) 

A.  traffic  control  and  anforcamant 
on  post  and  In  the  field 

B.  Providing  escort  security  and 
Bivslcal  security 

C. .  Making  arraitt,  gathering  Infoma* 

tion  on  erlefnal  activity,  and 
reporting  on  erleat 

D.  Patrolling  and  erlea/accldant  pre¬ 
vention  activities 

E.  ProoDtlng  confidence  In  the 
■llltary  police  by  nalntalnlng 
personal  and  legal  standards  and 
through  ccamunlty  service  work 

F.  Using  Intarpersonal  ccanunlutlon 
(IPC)  skills 

S.  Responding  to  ladical  aeargencles 
and  other  anargancles  of  a  non- 
crlelnal  nature 


Infnntryn  (IIB) 

A.  Ensuring  that  all  suppllot  and 
equimnt  are  field-ready  and 
available  and  well-naintalned 
In  the  field 

I.  Providing  leadership  and/or 
taking  charge  In  cenbat  situations 

C.  Navigating  and  survlvlig  In  the 

0.  Using  weapons  safely 

E.  Daoenitrating  proficiency  In  the 
use  of  all  wupoAS.  irsesanta, 
equI'iMint  and  supplies 

F.  Maintaining  sanitary  conditions, 
personal  hygiene,  and  personal 

safety  In  the  field 

C.  Preparing  a  fighting  position 

H.  Avoiding  aneey  detection  during 
nvenent  and  In  established 
defensive  positions 

I.  Operating  a  radio 

J.  Nrfonaing  reconnaissance  and 

Ktrol  actIvltiM 
rforolng  guard  and  security 
duties 

L.  Oaoonstrsting  courage  and  pro¬ 
ficiency  In  engaging  the  enaav 
N.  Cuarding  the  processing  POUs  and 
enatiy  casualties 


Tning  security  of 


(Continued) 
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T«b1t  3-5  (Continued) 

Behavioral  Examples  Reliably  Retranslated  Into  Each  CleMnslon 
on  the  BARS  Heasures 


filiuiisa 


thaber  of 
Cniielti 


Mater  of 

D1»n»1o«  Eximlot 


Arnnr  Crewn  (IK) 

A.  tuintolnlng  tank  hull/tuspontlon  123 

•yttOB  ond  moelitod  oqulpmont 

B.  Molnttining  tank  turret  tyitooi/  37 

. f Iro  control  tyitai 

C.  Drlvlng/recoverino  tank*  BO 

0.  Stewing  and  handling  aaMUiltlon  39 

C.  LeadIng/unleadIng  guns  30 

F.  Maintaining  guns  43 

C.  Engaging  tarMts  with  tank  gum  49 

H.  Oparating  and  aalntalning  39 

cownunicatlen  aqulpeant 

I.  Eatabllihlng  aaeurlty  In  tho  field  S3 

J.  Navigating  11 

K.  Fraparlng/aecuring  tatO  27 


Llght-Vheel  Vehicle  Nachanlc  (69B) 

A.  Inipecting.  tasting,  and  datacting  47 


problans  with  aqulpdant 

B.  Traublaiheotlng  63 

C.  Farforalng  routim  Mintanance  23 

0.  Repair  101 

E.  Uaing  tools  and  tast  oqulpiwnt  68 

F.  tiling  tachniul  decuaantatlon  96 

6.  Vahicia  and  aqulpeant  operation  18 

H.  Recovery  36 

I.  Planning/organizing  jobs  19 

J.  Adalnistrativa  dutioi  41 

K.  Safety  elnMness  ^ 


Mtcel  Special  lit  (BIA) 


Radio  Teletype  Opereter  (SIC) 

A.  Inspecting  aqulpeant  and  trouble-  90 

ihooting  problaea 

B.  Fulling  preventative  ealntanance  79 

and  servicing  aqulpeant 

C.  Initalling  and  preparing  eqiMpeant  162 

for  operation 

0.  Oparating  coanunlcatlona  devices  142 

and  providing  for  an  accurate  and 
tlmsiy  flow  of  inforeatlon 

C.  Preparing  reports  S3 

F.  Halntilnlng  security  of  Mulpesnt  97 

G.  locating  and  providing  aafa  60 

transport  of  equipeent  to  iltaa  _ 

37ff 


A.  Maintaining  and  operating  An^  61 

vehicles 

B.  Maintaining  accountability  of  28 

eadlcal  supplies  and  aqulpeant 

C.  Keeping  eadlcal  records  31 

0.  Attending  to  patients'  concerro  IS 

E.  Frovldlpg  accurate  diagnoses  In  a  11 

clinic,  hospital,  or  fiald  sotting 

F.  Arranging  for  transportation  and/  44 

or  transporting  Injured  personnel 

6.  Olsponslng  nsdieations  42 

H.  Pre^rlng  and  Irapecting  field  site  34 

or  clinic  facilities  In  tha  fiald 

I.  Providing  routim  and  ongoing  95 

patient  care 

J.  Responding  to  aeargancy  situations  142 

K.  Providing  Imtructlon  to  Arey  16 

personnel  _ 

nr 


66 


to  7  (effective  perfornance).  Raters  are  also  asked  to  evaluate  an  incum¬ 
bent's  overall  performance  across  all  MOS-specific  performance  dimensions. 
This  final  rating  scale  is  virtually  the  same  for  all  MOS;  it  includes  three 
anchors  depicting  ineffective,  average,  and  effective  performance. 


CRITERION  DEVELOPMENT:  ARMY-WIDE  RATING  SCALES 
0,9YA]9filWnt  flf  SCfl.US 

The  development  of  the  Army-wide  behavior  rating  scales  (Pulakos  & 
Borman,  1986}  followed  the  same  general  procedure  used  for  the  MOS-specific 
BARS. 

Critical  Incident  Workshops  and  Procedures 

Seventy-seven  officers  and  NCOs  participated  in  six  one-day  workshops 
intended  to  elicit  behavioral  examples  of  soldier  effectiveness  that  were  not 
MOS-specific.  A  total  of  1,315  behavioral  examples  were  generated  in  the  six 
workshops. 

Duplicate  incidents  and  incidents  that  did  not  meet  the  criteria 
specified  (e.o.,  the  incident  described  the  behavior  of  an  NCO  rather  than  a 
first-term  soldier)  were  dropped  from  further  consideration.  The  remaining 
1,1.11  examples  were  edited  to  a  common  format  and  content  analyzed  by  project 
staff  to  form  preliminary  dimensions  of  soldier  effectiveness.  Specifically, 
three  researchers  independently  read  each  example  and  grouped  together  those 
examples  that  described  similar  behaviors.  The  sorted  examples  were  then 
reviewed  and  the  groupings  were  revised  until  each  author  arrived  at  a  set  of 
dimensions  that  were  homogeneous  with  respect  to  their  content. 

After  discussion  among  project  staff  and  with  a  small  group  of  officers 
and  NCOS  at  Fort  Banning,  a  consensus  was  reached  on  a  set  of  13  dimensions. 
These  were  then  submitted  to  retranslation. 

Setranslation  of  the  Behavioral  Examples 

The  retranslation  task  was  divided  into  five  parts,  with  each  part 
requiring  a  judge  to  evaluate  216-225  behavioral  examples.  Judges  were 
provided  with  definitions  of  each  of  13  dimensions  to  aid  in  the  sorting,  and 
with  a  1-9  effectiveness  scale  to  guide  the  effectiveness  ratings. 

The  number  of  behavioral  examples  reliably  retranslated  for  each  of  the 
13  dimensions  is  shown  in  Table  3-6.  The  criteria  established  for  acceptance 
—greater  than  50  percent  agreement  for  the  sorting  of  an  incident  into  a 
single  dimension,  and  a  standard  deviation  of  less  than  2.0  for  the  distribu¬ 
tion  of  judges'  effectiveness  ratings  for  one  incident— were  met  by  870  of  the 
1,111  examples  (78%). 

Two  pairs  of  dimensions  were  combined  because  of  the  conceptual  similar¬ 
ity  of  each  of  the  pairs,  resulting  in  a  total  of  11  Army-wide  dimensions. 
Leading  Other  Soldiers  and  Supporting  Other  Unit  Members  were  combined  to  form 
Leading/Supporting;  Attending  to  Detail  and  Maintaining  Own  Equipment  were 
collapsed  to  form  Maintaining  Assigned  Equipment. 
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T«b1t  3-6 


B«h4v1ora1  Extmplts  R«11ab1y  Ratranslatad*  Into  Each  OlMnslon 
for  AriRy-Widt  Bthavior  Rating  Scalts 


Dimensions 

Number  of 
Examples 

A. 

Controlling  own  behavior  related  to  personal 
finances,  drugs/alcohol,  and  aggressive  acts 

107 

B. 

Adhering  to  regulations  and  SOP,  and  displaying 
respect  for  authority 

158 

C. 

Displaying  honesty  and  integrity 

53 

0. 

Maintaining  proper  military  appearance 

34 

E. 

Maintaining  proper  physical  fitness 

36 

F. 

Maintaining  living  and  work  areas  to 

Army  unit  standards 

23 

G. 

Exhibiting  technical  knowledge  and  skill 

47 

H. 

Showing  initiative  and  extra  effort  on  job/ 
mission/assignment 

131 

I. 

Developing  own  job  and  soldiering  skills 

40 

J. 

Attending  to  detail  on  jobs/assignments/ 
equipment  checks' 

Maintaining 

Assigned 

Equipment 

59 

K. 

Maintaining  own  equipment' 

46 

L. 

Effectively  leading  and  providing 
motivation  to  other  soldiers' 

Leading/ 

Supporting 

71 

M. 

Supporting  other  unit  members' 

870 

*Exanip1es  were  retained  if  they  were  sorted  into  a  single  dimension  by 
greater  than  50%  of  the  retranslation  raters  gcg,  had  standard  deviations 
of  their  effectiveness  ratings  of  less  than  2.0. 

'These  two  dimensions  were  subsequently  combined  to  form  a  Maintaining 
Assigned  Equipment  dimension. 

‘These  two  dimensions  were  subsequently  combined  to  form  a  Leading/Supporting 
dimension. 


68 


For  each  of  the  11  dimensions,  the  reliably  retranslated  behavioral 
examples  were  then  divided  Into  three  categories  of  effectiveness  levels  on  a 
g-polnt  scale,  and  behavioral  sunmary  statements  were  written  to  capture  the 
content  of  the  specific  examples  at  low  (l->3.49),  average  (3.5-6.49),  and  high 
(6.S-9)  performance  levels. 

AdlltlaniLAiMrttldB  .591111 

In  addition  to  the  11  Army-wide  BARS,  two  summary  rating  scales  were 
prepared.  First,  an  overall  effectiveness  scale  was  developed  to  obtain 
overall  Judgments  of  a  soldier's  effectiveness  based  on  all  the  behavioral 
dimension  ratines.  Second,  an  NCO  potential  scale  was  developed  to  assess 
each  soldier's  likelihood  of  being  an  effective  supervisor  as  an  NCO. 

FloalList  of  Armv-Wide  Behavioral  Ratino  Scales 

The  11  Army<w1de  BARS  that  were  retained  plus  the  overall  performance 
and  NCO  potential  scales  provided  the  following  behavioral  rating  scales  for 
the  field  testi 

A.  Technical  Knowledge/Skill 

B.  Effort 

C.  Following  Regulations  and  Orders 

D.  Integrity 

E.  Leadership 

F.  Maintaining  Assigned  Equipment 

6.  Maintaining  Living/Work  Areas 

H.  Military  Appearance 

I.  Physical  Fitness 

J.  Self-Oevelopment 

K.  Self-Control 
Overall  Effectiveness 
NCO  Potential 

Development  of  Armv-HIde  Common  Task  Dimensions 

Rating  scales  coverlno  the  common  task  domain  were  developed  from  tasks 
appearing  In  the  Skill  Level  1  Common  Task  Soldier's  Manual.  To  develop  these 
dimensions,  a  senior  staff  member  content  analyzed  the  specific  tasks 
contained  In  the  manual  (e.g.,  Read  and  Report  Total  Radiation  Dose;  Repair 
Field  Wire)  and  Identified  13  common  task  L^eas  that  appeared  to  reflect  In 
summary  form  all  of  the  specific  tasks. 

Ratings  consisted  of  evaluating  how  well  each  ratee  typically  performed 
each  task  on  a  7-po1nt  scale.  In  addition,  raters  were  given  the  option  of 
choosing  a  "0",  Indicating  that  they  had  not  observed  a  soldier  performing  In 
the  task  area.  The  13  common  task  dimensions  are: 

A.  See:  Identifying  Threat  (armored  vehicles,  aircraft) 

B.  See:  Estimating  Range 

C.  Communicate:  Send  a  Radio  Message 

D.  Navigate:  Using  a  Map 

E.  Navigate:  Navigating  In  the  Field 

F.  Shoot:  Performing  Operator  Maintenance  Weapon  (e.g.,  Ml 6  rifle) 

Gc  Shoot:  Engaging  Target  with  Weapon  (e.g.,  M16) 
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H.  Combat  Techniques:  Moving  Under  Direct  Fire 

I.  Combat  Techniques:  Clearing  Fields  of  Fire 

J.  Combat  Techniques:  Camouflaging  Self  and  Equipment 

K.  Survive:  Protecting  Against  NBC  Attack 

L.  Survive:  Performing  First  Aid  on  Self  and  Other  Casualties 
H.  Survive:  Knowing  and  Applying  the  Customs  and  Laws  of  War 


CRITERION  DEVELOPMENT:  COMBAT  PERFORMANCE 
PREDICTION  RATING  SCALE 

This  section  describes  the  development  of  a  combat  performance  predic¬ 
tion  scale,  designed  to  evaluate  performance  under  degraded  conditions  and  the 
Increased  confusion,  workload,  and  uncertainty  of  a  combat  environment.  Two 
difficulties  were  recognized.  First,  although  raters  may  often  observe 
soldiers  In  garrison/fleld  exercise  performance,  opportunities  to  observe 
performance  under  severely  adverse  conditions  may  have  been  limited.  Second, 
the  majority  of  peer  and  supervisor  raters  have  never  experienced  combat,  so 
they  were  being  asked  to  predict  how  soldiers  would  perform  In  a  situation 
that  tho  raters  themselves  may  not  have  known  first-hand. 

A  variant  of  the  critical  Incident  approach  was  used  to  Identify  dimen¬ 
sions  of  combat  effectiveness.  The  behavioral  examples  emerging  from  this 
step  were  content  analyzed,  and  submitted  to  a  retranslation  and  scaling 
procedure.  Following  field  testing,  the  best  Items  were  selected  and  a 
summated  rating  scale  format  was  developed,  which  was  used  In  the  Concurrent 
Validation. 

Critical  Incident  Workshops.  Forty-six  officers  and  NCOS  participated 
In  one  of  four  one-day  critical  Incident  workshops.  All  participants  were 
combat  veterans,  the  large  majority  with  experience  In  Vietnam.  In  each 
workshop,  a  staff  member  first  described  Project  A  and  explained  how  the 
prediction  of  combat  performance  was  an  Integral  part  of  the  project.  The 
workshop  leader  next  presented  a  preliminary  set  of  literature-based  dimen¬ 
sions  of  combat  effectiveness,  and  possible  modifications  and  additions  were 
discussed.  The  rest  of  each  workshop  was  devoted  to  writing  and  reviewing  the 
examples. 

A  total  of  361  examples  of  positions  and  negative  behavior  was  generated 
In  the  four  workshops.  After  duplicates  and  Items  that  were  specific  to 
officers,  MOS,  or  equipment  were  eliminated,  158  usable  examples  remained.  A 
review  of  the  critical  Incidents  that  had  been  used  In  the  Army-wide  rating 
scale  retranslation  workshops  revealed  73  that  described  behavior  In  a  combat- 
type  situation,  such  as  behavior  under  adverse  conditions  during  training  and 
field  exercises.  These  examples  were  added  to  the  158  usable  examples  from 
the  combat  workshops.  The  distribution  Is  shown  In  Table  3-7. 

Three  staff  members  Independently  read  each  example  and  grouped  those 
that  described  similar  behaviors.  The  content  analysis  of  the  Incidents 
resulted  In  a  reduction  of  the  number  of  dimensions  from  11  to  8.  The  revised 
dimensions  are  shown  In  Floure  3-1.  Employing  the  eight  dimensions  and  231 
behavioral  examples,  materials  were  developed  for  retranslation  and  scaling 
workshops. 
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T«b1t  3-7 

Huiflbtr  of  EdUtd  Examplas  of  Conbat  Bohavlor 


Type  of  Behavior 

Combat 

HarJifthoaft 

Positive 

96 

Negative 

62 

Total 

TES 

At*mv-Wide 

Workshops 

IsUl 

42 

138 

31 

93 

73 

23T 

Retranslatlon  and  _Scal1nQ  Workshops.  In  the  retranslation  process, 
acceptable  agreement  was  defined  as  greater  than  50  percent  of  the  16  judges 
sorting  an  example  into  the  same  dimension.  Of  the  231  examples,  108  did  not 
meet  this  criterion.  The  workshop  participants  also  rated  each  Incident  in 
terms  of  how  well  it  would  discriminate  the  "best"  from  the  "worst*  performer 
under  adverse  conditions.  For  the  summated  scale  form  of  the  Combat  Perfor¬ 
mance  Prediction  Scale,  the  goal  was  to  select  items  that  represented  the 
domain  of  combat  effectiveness  and  discriminated  between  performance  extremes. 
The  summated  scale  form  was  used  to  anchor  performance  rating  scales  with  more 
general  or  abstract  behavioral  examples.  These  general  statements  of 
performance  at  different  levels  of  effectiveness  add  perspective  to  the 
depiction  of  performance  (Borman,  1986,  p.  105). 

Allowino  for  time  constraints  in  testing,  and  eliminating  poor  items,  80 
items  were  selected.  To  reduce  the  administrative  burden  on  any  one  rater, 
two  forms  (Farm  A  and  Form  B)  were  developed.  Each  contained  60  items--40 
common  to  both  forms  and  20  unique  to  one  form. 

Review  and  Rescaling.  The  two  proposed  60- item  forms  of  the  Combat 
Performance  Prediction  Scale  were  reviewed  by  three  company  grade  Army 
officers  and  three  ARI  scientists.  As  a  result  of  that  review,  three  items 
common  to  both  forms  were  deleted  and  a  large  proportion  of  the  remaining  77 
items  were  reworded.  Since  the  rewording  was  extensive,  the  77  items  were 
subjected  to  a  rescaling,  using  the  same  workshop  procedures  as  for  the 
original  scalino.  Eight  officers  and  one  civilian  (seven  of  the  nine  were 
combat  veterans)  made  the  "best"  and  “worst"  combat  soldier  ratings  for  each 
of  the  77  items.  Only  one  item  was  dropped,  because  it  did  not  discriminate 
between  effective  and  ineffective. 

CRITERION  DEVELOPMENT;  ADMINISTRATIVE/ARCHIVAL  RECORDS 

A  major  activity  within  the  overall  program  of  performance  criterion 
development  was  to  explore  the  use  of  the  archival  administration  records  as 
first-tour  job  performance  criteria  and  in-service  predictors  of  soldier 
effectiveness  (Riegelhaupt,  Harris,  ft  Sadacca,  1985).  The  Enlisted  Master 
File  (EMF),  the  Official  Military  Personnel  File  (OMPF),  and  the  Military 
Personnel  Records  Jacket  (MPRJ)  are  the  Army  records  sources  that  contain 
administrative  actions  that  could  be  used  to  form  measures  of  first-tour 
soldier  effectiveness. 
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A.  C9hM)onZ£2im.nTOijt  \q  ^ . 

•  Ability  and  desire  to  fosttr  «  comon  spirit  of  devotion  and 
enthusiasm  among  members  of  a  group 

•  Concern  for  the  physical/emotional  welfare  of  the  Individual 
semburs  of  the  group 

•  Coamltuent  to  malntalnlng/enhanclng  the  effectiveness  of  the  group 

**  i^Abl ] lty"to^*iearn"quiciHy  and  apply  the  newly  acquired  knowledge/ 
skill  In  a  novel  situation 

•  Ability  to  size  up  a  situation  and  use  available  resources  to  make 
a  decision 

•  The  exercise  of  appropriate  judgment 

'•  f,r  th.  ..co^lUh-nt  .f  th. 

task  at  hand 

•  Concern  for  conditions  that  jeopardize  the  safety  of  self  and 
others 

•  Concern  for  the  maintenance  of  weapons  and  equipment,  etc. 

•^aJ!? Jty^and^w] l??ngness" to  maintain  both  physical  and  medical 
fitness 

e  Physical  endurance  as  demonstrated  bv  little  or  no  reduction  In 
performance  even  after  or  during  prolonged  or  strenuous  activities 

e  Concern  for  proper  health  care/hyglene  to  avoid  sickness  and 
disease 


E. 


0rii>p4ni9n. 


•  willingness  to  make  sacrifices  and  endure  hardships  to  accomplish 
mission 


•  Commitment  and  dedication  to  accomplishing  one's  assigned  duties/ 
resDonsIbllltles 

•  Willingness  to  accept  a  reasonable  amount  of  risk  In  the  pursuit 
of  mission  accomplishment 


•  Knowledge  of  and  ability  to  coordinate  weapons,  ammunition,  and 
equipment 

•  Ability  to  perform  MOS-specIfIc  and  common  soldiering  tasks 

6.  Pevcho log leal  Effects  of  Combat 

•  reaction  to  stress  associated  with  shooting  and  killing,  losing  a 
unit/team  leader,  seeing  others  wounded  or  killed,  waiting  for 
orders  between  engagements,  etc. 

•  Ability  to  perform  duties  with  little  or  no  decrement  under 
emotionally  stressful  situations 


H. 


yi 


•  Ability  and  willingness  to  take  the  appropriate  action  at  the 
appropriate  time  without  being  told  to  do  so 


Rgurm  9*1.  Revised  set  of  oombet  performenoe  dimensions. 
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The  EMF  1&  an  automated  Inventory  of  personal  data,  enlistment  condi¬ 
tions,  and  military  experience  for  every  enlisted  individual  currently  on  the 
U.S.  Army  payroll.  It  contains  a  large  number  of  variables  for  each 
Individual,  ranging  from  pay  grade  to  Skills  Qualification  Test  (SQT)  scores 
to  the  Army's  operational  performance  appraisal  ratings  in  the  form  of  the 
Enlisted  Efficiency  Report  (EER). 

The  OHPF  Is  the  permanent,  historical,  and  official  record  of  a  member's 
military  service.  The  information  for  enlisted  personnel  is  maintained  on 
microfiche  records  at  the  Enlisted  Records  and  Evaluation  Center,  Fort 
Benjamin  Harrison. 

The  MPRO,  or  201  File,  is  the  primary  mechanism  for  storing  information 
about  an  individual's  service  record.  Updates/additions/corrections  to  the 
file  are  made  at  the  time  of  the  action.  The  MPRJ  physically  follows  the 
individual  wherever  he  or  she  goes  and  is  normally  located  at  the  Military 
Personnel  Office  (MILPO)  that  serves  the  soldier's  unit. 

A  series  of  small  pilot  tests  were  conducted  to  explore  the  information 
content  of  each  source,  identify  the  problems  that  would  be  involved  in  using 
it,  and  develop  an  appropriate  data  collection  protocol  that  could  be  used  in 
a  large-scale  systematic  records  search.  In  so  doing,  an  initial  list  of 
potentially  useful  administrative  records  was  identified,  and  is  shown  in 
Table  3-8. 


CgjUBflrfl^lYg.PI  lot.  Test 


A  systematic  comparison  of  the  three  data  sources  was  carried  out  on  a 
pilot  sample  of  650  records.  The  original  plan  was  to  collect  data  from  the 
MPRO  for  a  sample  of  750  soldiers,  150  in  each  of  five  MOS  at  five  Army  posts. 
To  achieve  this  sample  size,  the  records  of  200  soldiers  at  each  post  were 
requested.  Data  were  collected  by  teams  of  two  research  staff  members  In  2- 
day  visits  to  each  of  five  posts.  Only  those  soldiers  who  entered  the  Army 
between  1  July  1981  -  31  July  1982  at  an  initial  grade  of  PFC  or  less  were 
retained.  The  result  was  a  sample  of  650  soldiers  in  the  05C,  IIB,  64C,  71L, 
or  91B  MOS  who  had  been  in  the  Army  between  14  and  27  months. 


Mi-TItary. Personnel  Records  Jacket  (MPRJl  -  Official  Military  Personnel  File 


Using  the  records  collection  form  developed  to  extract  records  data  from 
the  MPRJ,  three  research  staff  members  spent  2  days  collecting  records  data 
from  the  OMPFs  of  292  soldiers.  The  292  Individuals  represented  a  random 
sample  of  the  650  soldiers  from  whose  MPRJs  administrative  records  data  had 
previously  been  collected.  The  MPRJ  was  found  to  be  a  much  richer  source  than 
the  OMPF  for  Information  on  the  administrative  actions  of  interest  in  Project 
A.  In  the  extreme  case,  even  information  relevant  to  a  soldier's  reenlistment 
eligibility  was  not  available  from  the  OMPF. 
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Table  3-8 

List  of  Adalnlatratlvo  Hoasuros  Indteativo  of  Soldier  Effectiveness 


e  Comparison  of  Skill  Level  of  Primary  to  Duty  MOS. 
e  Existence  of  Secondary  MOS. 
e  Existence  of  Skills  Qualification  Identifier  (SQI). 
e  Existence  of  Additional  Skill  Area  (ASI). 
e  Existence  of  Language  Identifier. 

e  Record  of  Skill  Qualification  Test  (SQT)  Score  Within  Past  12  Months, 
e  Type  of  Reenlistment  Eligibility, 
e  Type  of  Military  Education  Leadership  Course, 
e  Level  of  Highest  Civilian  Education, 
e  Promotion  Rate. 

e  Existence  of  Promotion  Packet  at  E4. 
e  Number  and  Type  of  Awards/Badges. 

e  Record  of  Requalification  Weapons  Score  Within  Past  12  Months. 

0  Number  and  Type  of  Certificates  of  Achievement/Appreciation/ 
Commendation. 

e  Number  and  Type  of  Letters  of  Appreciation/Commendation. 
e  Number  and  Type  of  Letters  of  Reprimand/Admonition, 
e  Number  of  Additional  Civilian  Education  Classes  Completed, 
e  Number  and  Type  of  Correspondence  Courses  Completed, 
e  Number  of  Additional  Civilian  Education  Classes  Completed, 
e  Course  Summary  and  Abilities  Ratings  •  Service  School, 
e  Professional  Competence  ano  Standards  Ratings  and  Summary  Score 
of  Enlisted  Efficiency  Report, 
e  Type,  Sentence,  Suspension,  Vacation  of  Courts-Martial, 
e  Existence  of  Courts-Martial  Proceedings  In  Action  Pending, 
e  Reason  for  Bar  to  Reenlistment, 
a  Number  and  Duration  of  AWOL. 
e  Number  of  Violations,  and  Reason  for  Articles  15. 
e  Reason  for  FLAG  Action. 

e  Number  of  and  Reason  for  Disposition  -  Block  to  Promotion. 


Military  Personnel  Records  Jacket  (MPR^^  Enlisted  Master  File 


a  urii  M*niiTsi'TiK 


Unlike  the  MPRJ-OMPF  comparison,  a  rather  high  degree  of  correspondence 
existed  between  the  MPRJ  and  the  EMF.  Even  In  light  of  delays  In  data  entry, 
the  correspondence  between  sources  was  Impressive  and  highlighted  the  benefits 
of  having  current  EMF  Information  available. 


Variable  Selection 


A  first  step  in  determining  the  usefulness,  for  Project  A  purposes,  of 
the  administrative  variables  collected  from  MPRjs  (201  Files)  was  to  select 
those  measures  with  an  acceptable  amount  of  variance.  Based  upon  the  fre¬ 
quency  distributions  and  intercorrelations  of  the  possible  indexes,  and 
regulations  governing  reenlistment  and  promotion  criteria,  six  variables  were 
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selected  as  having  the  highest  potential  for  being  useful  criteria  and  in- 
service  predictors  for  Project  A: 

•  Eligible  to  Reenlist 

•  Number  of  Letters/Certificates 

•  Number  of  Awards 

•  Number  of  Military  Training  Courses 

e  Has  Received  Article  IS/FLAG  Action 

•  Promotion  Rate  (Grades  Advanced/Year) 

Relationships  of  Administrative  Measures  With  Other  Variables 

Each  of  the  six  administrative  measures  and  a  combined  "Has  Received 
Letter/Certification/Award"  variable  were  subjected  to  a  series  of  analyses. 
These  included  an  examination  of  MOS  and  Post  differences;  stepwise  multiple 
regressions,  in  which  AFQT,  Moral  Waiver,  Sex,  and  Race  were  entered  after 
controlling  for  Post  and  MOS  effects;  and  univariate  analyses,  in  the  form  of 
chi-square  tests,  for  those  variables  entered  into  the  regression  equation 
with  a  significant  £  value  at  the  time  of  first  entry. 

First,  there  was  no  evidence  that  a  soldier's  race  was  a  significant 
determiner  of  his  or  her  Reenlistment  Eligibility,  Number  of  Awards,  or  any 
other  of  the  Army-wide  administrative  measures.  Second,  although  a  soldier's 
sex  was  related  to  Awards  (males  received  more)  and  to  Letter/Certif icate 
(feuiales  received  more),  when  the  two  variables  were  combined  into  the 
Letter/Certificate/Award  measure,  sex  differentials  were  no  longer  statis¬ 
tically  significant. 

Third,  Armed  Forces  Qualification  Test  (AFQT)  score  or  mental  category 
(see  Appendix  A)  was  related  to  successfully  completing  Military  Training 
Courses  and  to  Number  of  Awards,  indicating  the  possible  usefulness  of  the 
ASVAB  in  predicting  aspects  of  Army-wide  performance.  Fourth,  both  Reenlist¬ 
ment  Eligibility  and  Promotion  Rate  (from  E-l  to  E-4),  which  may  be  related  to 
non-cognitive  as  well  as  cognitive  factors,  do  not  appear  to  be  dependent  on 
the  soldier's  location  (Post),  MOS,  or  demographic  group  (i.e.,  these  measures 
seem  to  be  fairly  even-handedly  administered  Army-wide). 

Finally,  there  were  distinct  MOS  and  post  differences  in  average  scores 
for  most  of  the  measures.  For  example.  Administrative  Specialists  (71L) 
received  more  letters/certificates  and  Infantrymen  (IIB)  more  awards  than 
soldiers  in  other  MOS.  Soldiers  at  one  of  the  five  posts  visited  received 
more  letters,  certificates,  and  awards,  and  more  extra  training  than  soldiers 
at  the  other  posts.  Care  should  be  exercised  in  pooling  performance  measure¬ 
ment  data  across  MOS  and  posts. 

Cr.Utr.1on  FlJ.11  Tbs t? _ Self-Reocrts  of  Administrative  Actions 

While  the  use  of  administrative  measures  is  consonant  with  the  Project  A 
multimethod  approach  to  performance  measurement,  and  while  these  indexes  hold 
predictors  of  second-tour  performance,  it  must  be  asked  whether  the  effort  and 
expense  of  collecting  these  Indexes  from  the  201  Files  are  justified  by  the 
outcome.  Also,  while  there  was  a  high  degree  of  correspondence  between 
information  on  the  EMF  computerized  rile  and  information  collected  from  the 
individual  201  Files,  a  number  of  the  most  promising  variables  were  not 
available  from  the  EMF. 
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Accordingly,  another  method  of  obtaining  information  was  tried  out.  A 
self-report  instrutoent,  the  Personnel  File  Information  Form,  was  developed  and 
administered  during  the  Batch  A  field  testing.  The  self-report  information 
could  then  be  compared  to  the  information  In  actual  201  Files,  obtained  by  the 
project  team  during  the  field  test  period. 

CRITERION  DEVELOPMENT:  MEASURES  OF  TRAINING  SUCCESS 

Training  achievement  tests  were  developed  to  measure  training  success 
for  the  19  MOS  In  the  Project  A  sample  (Davis,  R.  H.,  Davis,  G.  A.,  Joyner,  & 
deVera,  1986).  The  training  performance  measures  were  to  serve  both  as 
criteria  for  selection/classification  predictor  validation  and  as  in-^ervicc 
predictors  of  later  job  performance.  A  longstanding  question  is  whether 
training  performance  criteria  and  job  performance  criteria  provide  the  same 
information  about  predictor  validity. 

Within  the  Army,  there  is  a  verv  close  relationship  between  training 
content  and  tasks  performed  on  the  job.  As  a  matter  of  doctrine,  training 
must  be  job-related,  and  the  knowledges  and  skills  necessary  for  the  perfor¬ 
mance  of  a  job  at  Skill  Level  1  are  taught  In  Advanced  Individual  Training. 

As  a  result,  if  content  validity  is  based  on  curricular  materials  alone,  then 
by  design  most  of  the  items  should  be  job-related. 

There  are  perhaps  three  critical  components  of  content  validity  In  this 
context.  First,  the  content  domain  should  be  clearly  defined  and  the 
boundaries  of  the  domain  from  which  test  content  is  drawn  should  be  clearly 
understood.  Once  the  boundaries  are  defined,  experts  should  be  able  to  agree 
as  to  whether  or  not  Items  fall  inside  or  outside  of  those  boundaries.  For 
training  content,  the  domain  was  described  by  the  Program  of  Instruction  (POI) 
lesson  plan,  technical  publications,  and  training  manuals.  For  the  job, 
content  was  specified  by  Army  Occupational  Surveys,  technical  publications. 
Soldier's  Manuals,  and  the  Common  Task  Manual.  Second,  the  sample  of  content 
to  be  tested  should  be  representative  of  the  domain.  Third,  the  content  to  be 
tested  should  be  highly  relevant  for  the  goals  of  training. 

Also,  it  seems  clear  that  some  trainees  learn  relevant  knowledges  and 
skills  that  are  not  part  of  the  explicit  goals  of  instruction  and  go  beyond 
the  formal  course  content.  From  the  perspective  of  criterion  development,  the 
most  successful  trainee  Is  one  who  goes  beyond  the  formal  course  objective. 
This  is  a  distinction  between  direct  and  incidental  learning.  A  relevant 
question  is  the  degree  to  which  the  correlation  between  training  performance 
and  job  performance  is  a  function  of  direct  learning  during  training,  inci¬ 
dental  learning  during  training,  or  Individual  differences  in  basic  abilities 
that  are  present  before  training  starts. 

lest  Development  Procedure 

The  principal  steps  in  the  construction  of  the  training  achievement 
tests  were  as  follows: 

-  Preparation  of  the  item  "budget"  to  ensure  coverage  of  duty  areas 

per  MOS 

-  Development  of  the  initial  item  pool 

-  Review  of  item  pool  by  job  incumbents 
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-  Review  of  iten  pool  by  school  trainers 

-  Pilot  administration  of  items  to  trainees 

-  Preparation  of  the  item  pools  for  administration  to  Job  incumbents 

'  Administration  to  Job  Incumbents  (Field  Tests) 

•  Review  by  TRADOC  Proponent  agencies 

-  Preparation  of  the  item  pools  for  administration  to  Job  incumbents  in 

the  Concurrent  Validation 

Although  each  test  went  through  many  revisions  during  this  process, 
there  were  three  princ<pal  versions:  (a)  the  initial  item  pool,  (b)  the 
version  administered  to  incumbents  in  the  field  test,  and  (c)  the  version 
administered  to  incumbents  in  the  Concurrent  Validation.  Figure  3-2  sum¬ 
marizes  the  developmental  procedures  and  illustrates  the  difference  in 
procedures  for  Batch  A/B  and  Batch  Z. 

toTAlaPTODl.-gf-thg  laltiBl.  Itm 

The  initial  content  source  was  the  Arffl.y  Occupational  Survey  Program 
(AOSP)  which  uses  a  questionnaire  checklist  of  several  hundred  items  to  survey 
Job  incumbents  about  specific  Job  tasks  that  they  do  or  do  not  perform. 

Related  tasks  are  combined  into  duty  areas  and  the  number  of  duty  areas  in 
each  of  the  19  MOS  ranged  from  15  to  23.  A  key  statistic  reported  is  the* 
percentage  of  soldiers  at  different  skill  levels  who  are  performing  the  task 
activity. 

Before  the  AOSP  items  were  used,  99  percent  confidence  intervals  were 
computed  for  the  mean  percentage  performing  each  task,  and  tasks  equal  to  or 
less  than  the  lower  boundary  of  the  confidence  interval  were  deleted.  The 
remaining  task  statements  were  then  reviewed  by  4-6  SMEs  for  relevance  and 
clarity  and,  using  the  following  procedure,  an  item  budget  was  drafted  with  an 
initial  target  of  225  items. 

The  match  between  AOSP  duty  areas  and  training  objectives  was  determined 
by  preparing  e  matrix  of  the  AOSP  duty  areas  versus  the  subdivisions  of  che 
POI.  Three  outcomes  were  possible:  (a)  duty  areas  matched  Army  training 
lessons  completely;  (b)  duty  areas  did  not  match  any  training  lesson;  (c) 
training  lessons  did  not  match  any  duty  area.  The  majority  of  the  item 
budget,  200  items,  was  allocated  to  the  first  two  categories. 

Items  were  then  budgeted  in  proportion  to  hew  much  they  were  emphasized 
in  training:  The  greater  the  overlap  between  the  AOSP  tasks  (within  a  duty 
area)  and  the  training  objectives  (within  the  POI),  the  more  items  were 
written  to  represent  job/training  content.  The  remaining  items  (out  of  the 
original  200)  were  assigned  to  Job-only  content. 

After  item  budgets  were  established,  written  materials  dealing  with  Job 
training  activities  were  examined  and  multiple-choice  items  were  drafted  for 
all  MOS.  The  item-writing  grouo  included  the  research  staff  and  contract 
item-writors. 
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Review  by  Jo b  Incumbents 


After  the  pool  was  first  reviewed  by  one  subject  matter  expert  who 
purged  the  Item  pool  of  Its  more  glaring  faults,  the  Items  were  then  reviewed 
by  joh  Incumbents  for  accuracy  and  relevance  during  a  series  of  site  visits, 
and  Items  were  revised  where  appropriate.  Incumbents  were  next  asked  to  rate 
the  importance  of  each  Item  on  a  S-poInt  scale  In  three  different  contexts: 
combat,  combat  readiness,  and  garrison  duty. 

Mean  Interrater  reliabilities  were  reasonably  high  for  the  combat  and 
combat  readiness  scenarios,  .74  and  .71  respectively,  but  somewhat  lower  for 
the  garrison  scenario,  .60.  To  establish  tne  relevance  of  the  draft  test 
items.  Incumbents  were  asked,  "Do  Skill  Level  1  personnel  in  this  MOS  need  to 
use  this  knowledge  on  the  job?" 

Rc.Y.1sw,.b  Sthool  ..Traingr} 

The  Item  pool  was  also  reviewed  by  trainers  at  one  of  the  training  sites 
for  the  MOS.  As  with  the  review  by  job  incumbents,  the  trainers  reviewed 
items  for  technical  accuracy  and  appropriate  vocabulary,  and  rated  Item 
content  for  Importance  and  relevance  to  the  goals  of  training.  It  was  during 
such  site  visits  that  pilot  tests  were  conducted  with  trainees,  as  described 
in  the  next  subsection. 

After  review  by  job  Incumbents  and  trainers,  test  Items  were  admini¬ 
stered  to  groups  of  trainees  In  their  last  week  of  training.  A  sample  of 
trainees  was  also  Interviewed  after  the  test  to  obtain  information  about  the 
clarity  and  comprehensibility  of  the  Items. 

Preparation  of  Batch  A  and  Batch  B  Training  Achievement  Tests  fcr  Field  Tests 
With. Job  incumbents 

After  all  the  SME  judgments  were  made  and  trainee  tryouts  completed,  the 
Items  were  revised  In  accordance  with  the  SME  and  trainee  comments  and  the 
Item  pools  were  prepared  for  administration  to  job  Incumbents  In  the  field 
tests.  Data  from  the  field  test  administration  were  later  used  to  convert  the 
pools  of  draft  Items  into  the  standardized  training  knowledge  tests. 

As  the  Item  pools  were  cut  and  Items  added  or  changed  in  these  early 
test  construction  steps.  Items  were  dropped  If  they  were  judged  to  be  of 
little  Importance  or  no  relevance.  However,  the  nature  of  the  Item  budget  was 
preserved  by  adding  new  items  If  necessary. 

Field  Test  Instruments 

At  this  stage  the  nine  training  achievement  tests  for  the  MOS  In  Batch  A 
ond  Batch  6  were  deemed  ready  for  field  testing  with  job  Incumbents, 

Up  to  this  point  the  10  tests  for  the  10  MOS  In  Batch  2  followed  the 
same  developmental  steps  as  for  the  tests  In, Batches  A  and  B.  However,  as 
noted  previously,  the  Batch  Z  Instruments  were  not  field  tested  with  job 
Incumbents.  Consequently,  the  Concurrent  Valldaticn  versions  of  the  10  tests 
retained  more  Items  than  do  the  nine  A/B  tests.  Additional  Item  analyses  were 
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carried  out  for  Batch  2  on  the  basis  of  the  data  from  the  Concurrent  Valida¬ 
tion  sample. 


CRITERION  FIELD  TESTS 

The  complete  array  of  specific  criterion  measures  evaluated  In  the 
criterion  field  test  Is  given  below.  Again,  the  distinction  between  MOS- 
specific  and  Army-wide  Is  that  the  latter  are  the  same  across  all  MOS.  The 
content  of  the  M0S-spec1f1c  measures,  regardless  of  whether  they  are  job 
samples,  knowledge  tests,  or  ratings,  concerns  a  particular  job  and  Is  based 
on  the  task  content  of  that  Job.  Also,  the  judgment  (i.e.,  rating)  of  "NCO 
potential"  refers  to  a  first-tour  enlisted  soldier's  potential,  assuming  the 
Individual  would  reenlist,  for  being  an  effective  noncommissioned  officer, 
with  supervisory  responsibilities,,  during  the  second  tour  of  duty. 


MQS-Soeclflc  Performance  Measures 

1)  Paper-and-pencll  tests  of  achievement  during  training, 
consisting  of  Job-relevant  knowledge  tests  of  100  to 
200  Items  per  MOS.  Items  can  be  aggregated  by  POI 
module  or  by  MOS  duty  area. 

2)  Paper-and-pencil  tests  of  knowledge  of  task 
procedures  consisting  of  an  average  of  about  nine 
Items  for  each  of  30  major  tasks  for  each  MOS. 

Item  scores  can  be  aggregated  In  at  least  four  ways. 


-  Sum  of  item  scores  for  each  of  the  30  tasks. 

-  Total  score  for  15  tasks  also  measured  hands-on. 

-  Total  score  for  15  tasks  not  measured  hands-on. 

-  Total  score  on  all  30  tasks. 

3)  Hands-on  measures  of  proficiency  on  tasks  for  each 
MOS,  measured  on  15  tasks  selected  from  the  30  tasks 
measured  with  the  paper-and-pencll  test. 

-  Individual  task  scores. 

-  Total  score  for  all  15  tasks. 

4)  Ratings  of  performance,  using  a  7-po1nt  scale,  on  each 
of  the  15  tasks  measured  via  hands-on  methods  by: 

-  Supervisors 

-  Peers 

-  Self 

5)  Behavioral 1y  anchored  rating  scales  of  6-12 
perfoimance  dimensions  for  each  MOS  by: 

-  Supervisors 

-  Peers 

-  Self 
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6)  A  general  rating  of  overall  MOS  task  performance  by: 

-  Supervisors 

-  Peers 

-  Self 

7}  A  Job  history  questionnaire  administered  to  incumbents 
to  determine  the  frequency  and  recency  of  task 
performance  on  the  30  tasks  for  which  Job  knowledge  tests 
were  developed. 


Army-Wide  Measures 


1)  Eleven  behaviorally  anchored  ratino  scales  designed 
to  assess  the  dimensions  listed  below.  Three  sets 
of  ratings  (i.e.,  from  supervisors,  peers,  and 
self)  were  obtained  on  each  scale  for  each  individual. 


-  Technical  Knowledge/Skill 

-  InUiative/Effort 

•  Following  Regulations/Orders 

-  Integrity 

-  Leading  and  Supporting 

-  Maintaining  Assigned  Equipment 

-  Maintaining  Living/Work  Areas 

-  Military  Appearance 

-  Physical  Fitness 

-  Self-Development 

-  Self-Control 


2)  A  rating  of  general  overall  effectiveness  as  a  soldier  by: 

-  Supervisors 

-  Peers 

-  Self 


3)  A  rating  of  noncommissioned  officer  potential  by: 

-  Supervisors 

-  Peers 

-  Self 

4}  A  rating  of  performance  on  each  of  14  common  tasks 
from  the  Manual  of  Common  Tasks  by: 


-  Supervisors 

-  Peers 

-  Self 


5)  A  77-item  summated  rating  scale  of  expected  combat 
effectiveness. 


-  Supervisors 

-  Peers 

-  Self 
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6)  A  14-item  self-report  measure  (the  Personnel  File 
Information  Form)  of  certain  administrative  indexes 
such  as  awards,  letters  of  recommendation,  and 
reenlistment  eligibility. 

7)  The  same  administrative  Indexes  taken  from  201 
Files  (by  project  staff). 

8)  An  Environmental  Questionnaire,  a  descriptive  questionnaire 
completed  by  both  incumbents  and  supervisors  for  the 
purpose  of  describing  14  factors  pertaining  to  organizational 
climate,  structure,  and  practice  (Peterson,  Hough,  Ashworth, 

&  Torquam,  1986). 

9)  A  99-item  Leader  Behavior  Questionnaire  to  measure  Incumbents' 
perceptions  of  the  leadership  behaviors  and  practices 

in  their  unit  (White,  Gast,  S  Rumsey,  1986). 

10)  A  Measurement  Method  Questionnaire  administered  at  the 

end  of  the  testing  sessions  to  obtain  soldiers'  reactions  to 
the  various  types  of  testing. 

Sflfnpigs 

The  samples  for  the  field  tests  were  drawn  from  the  nine  Batch  A  and 
Batch  B  MOS  and  from  six  different  locations.  Tables  3-9  and  3-10  provide  a 
breakdown  of  the  criterion  field  test  sample  sizes  by  MOS  and  location,  and  by 
race  and  sex,  respectively.  The  USAREUR  data  collection  site  was  just  outside 
Frankfurt,  Germany. 


Table  3-9 

Field  Test  Sample  Soldiers  by  MOS  and  Location 


MOS 


Location 

IIB 

13B 

19E 

31C 

63B 

64C 

71L 

91A 

95B 

Total 

Fort  Hood 

m  m 

•  m 

.  *  W 

.. 

48 

w 

42 

90 

Fort  Lewis 

29 

m  m 

30 

16 

13 

mm 

m  m 

24 

w 

112 

Fort  Polk 

30 

>• 

31 

26 

26 

mm 

60 

30 

42 

245 

Fort  Riley 

30 

— 

24 

26 

29 

mm 

21 

34 

30 

194 

Fort  Stewart 

31 

30 

23 

27 

m  m 

21 

m  m 

132 

USAREUR 

150 

51 

155 

55 

Total 

178 

ISO 

172 

148 

156 

15S 

129 

167 

114 

1,369 
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Tabl«  3-10 

Field  Test  Semple  Soldiers  by  Gender  end  Rece 


Race 

Hale 

Female 

Total 

Black 

330 

58 

388 

Hispanic 

37 

3 

40 

White 

789 

104 

893 

Other 

Total 

1,199 

170 

1,369 

Procedure 

For  the  purpose  of  data  collection  in  the  field  tests,  the  criterion 
measures  were  divided  Into  four  major  blocks  corresponding  to: 

!1)  Hands-on  (Job  sample)  measures  (HO). 

2)  Rating  measures  (R)  •  both  Army-wide  and  MOS-specIfIc. 

3)  Paper-and-pencll  measures  of  Job  knowledge  (K|). 

4)  Paper-and-pencll  measures  of  training  achievement  (K,). 

Each  block  comprised  one-half  day  of  participant  time  and  each  participant  was 
tested  for  a  2-day  period. 

uuring  the  week  preceding  data  collection  at  each  research  site,  the 
scorers  for  the  hands-on  (Job  sample)  measure  were  given  2  days  of  training  on 
scoring  procedures,  test  standardization,  and  the  overall  design  and 
objectives  of  Project  A. 


Analysis 

The  general  data  analytic  steps  were  straightforward  and  consisted  of 
the  following; 

(1)  An  Item  analysis  summary  table  for  each  knowledge  test  for 

each  MOS.  The  table  for  each  MOS  summarized  Item  discrimination 
Indexes,  Item  difficulties,  and  the  frequency  of  Items  that  were 
flagged  for  various  kinds  of  potential  keying  errors  (c.g., 
negative  correlation  with  total  score,  high  frequency  of  response 
for  Incorrect  answer). 

(2)  An  Item  (where  task  ■  Item)  analysis  for  each  hands-on  (Job 
sample)  test. 
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(3)  Frequency  distribution  and  scale  statistics  for  each  rating  scale 
for  each  MOS. 

(4)  Interrater  reliabilities  for  the  individual  rating  scales. 

(5)  Split-half  correlations  (Spearman-Brown  estimates)  for  the 
knowledge  tests  and  hands-on  measures,  test-retest  coefficients 
for  the  hands-on  measures,  and  internal  consistency  indexes  where 
applicable. 

(6)  A  complete  intercorrelation  matrix  of  all  the  criterion  variables 
for  each  MOS  down  to  the  scale  score  and  task  score  level  (i.e., 
the  matrix  included  all  the  variables  listed  in  the  previous 
sections). 

(7)  A  set  of  reduced  intercorrelations  matrixes  that  Included  subsets 
of  the  total  array  of  variables. 

(8)  Factor  analyses  for  selected  matrixes,  primarily  those  having  to 
do  with  the  rating  scale  measures. 

The  results  of  the  above  analyses  were  prepared  in  a  master  data  book 
for  each  MOS.  Each  data  book  contained  item  and  scale  analyses,  inter¬ 
correlations  down  tc  the  scale  and  subscale  level,  and  factor  analyses  of 
selected  data  sets. 

These  data  were  then  carefully  scrutinized  by  a  designated  criterion 
analysis  group.  The  group  Included  the  principal  investigator  for  each  of  the 
criterion  measures,  the  principal  scientist  for  the  project,  the  ARI  chief 
scientist  and  task  monitors  for  the  project,  and  the  assistant  project 
director,  who  served  as  chair. 

The  objectives  of  the  group  were  to  review  the  results  of  the  field 
tests  and  agree  upon  the  specific  revisions  to  be  made  in  each  criterion 
measure  before  the  criterion  array  was  declared  set  of  criterion  measures 
that  would  be  used  for  the  Concurrent  Validation. 


FIELD  TEST  RESULTS 
Job  Knowledge  Tests 

Between  14  and  18  percent  of  the  item?  in  each  MOS  item  set  were  revised 
as  a  consequence  of  field  test  experience,  and  between  17  and  24  percent  of 
the  items  wore  dropped.  The  median  difficulty  levels  were  55  to  58  percent 
for  five  of  the  MOS,  with  the  MOS  63B,  91A,  19E,  and  95B  tests  having  medians 
of  65  tc  74  percent.  Although  some  skew  in  item  difficulties  was  observed,  it 
was  not  extreme. 

The  means,  standard  deviations,  and  reliabilities  for  the  total  test 
score  in  each  MOS  are  shown  in  Table  3-11.  The  reliabilities  are  split-half 
coefficients,  using  15  task  tests  in  each  half,  corrected  to  a  total  length  of 
30  task  tests. 
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Table  3-11 


Means,  Standard  Deviations,  and  Split-Half  Reliabilities  for 
Knowledge  Test  Components  for  Nine  MOS 


Mean 

HUBmol 

MOS 

{%) 

Deviation 

Reliabilit/ 

13B  - 

Cannon  Crewman 

58.9 

12.6 

.86 

64C  - 

Motor  Transport  Operator 

60.3 

10.1 

.79 

71L  - 

Administrative  Specialist 

55.B 

10.4 

.81 

95B  - 

Military  Police 

66.4 

9.2 

.75 

IIB  - 

Infantryman 

56.0 

10.5 

.91 

19£  - 

Armor  Crewman 

64.0 

10.1 

.90 

31C  - 

Single  Channel  Radio  Operator 

57.7 

9.6 

.84 

63B  - 

Light  Wheel  Vehicle  Mechanic 

64.4 

9.1 

.86 

91A  - 

Medical  Specialist 

69.8 

8.1 

.85 

'Fifteen  task  tests  in  each  half,  corrected  to 

a  total 

length  of  30 

tests. 

Hands-On  Tests 

The  hands-on  tests  resulted  in  15  task  scores,  with  each  task  composed  of  a 
number  of  scorable  steps.  Steps  that  had  low  or  negative  correlations  with  the 
total  task  score  were  reviewed  to  identify  situations  where  performance 
prescribed  by  local  practices  was  as  correct  at  that  site  as  doctrinally 
prescribed  procedures.  Instructions  to  scorers  and  to  soldiers  were  revised  as 
necessary  to  insure  consistent  scoring. 

However,  use  of  steo  statistics  to  revise  task  tests  was  purposely  limited 
because  a  task  test  usually  represents  an  integrated  procedure  and  removal  of  a 
step  which  the. Soldier's  Manual  specifies  as  a  part  of  the  Job  may  result  in 
deleting  a  doctrinal  requirement.  Table  3-12  snows,  for  each  MOS,  the  means, 
standard  deviations,  and  split-half  reliability  estimates  of  the  hands-on 
components  across  revised  task  tests. 

In  revising  the  hands-on  tests,  the  goal  for  each  MOS  was  a  set  of  between 
14  and  17  task  tests.  Field  test  experience  indicated  that  reductions  of  this 
magnitude  would  meet  the  time  allotments  for  Concurrent  Validation.  Both  the 
field  test  results  and  additional  systematic  Judgments  by  the  project  staff  of 
the  "suitability"  of  the  test  for  hands-on  measurement  were  used  to  make  these 
reductions. 
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Table  3-12 


Means,  Standard  Deviations,  and  Split-Half  Reliabilities  for 
Hands-On  Test  Components  for  Nine  MOS 


MOS 

N 

Mean 

(%) 

Standard 

Deviation 

Split-Half 

Rellabillt/ 

13B  -  Cannon  Crewman 

146 

54.5 

14.0 

.82 

64C  -  Motor  Transport  Operator 

149 

72.9 

9.1 

.59 

71L  -  Administrative  Specialist 

126 

62.1 

9.9 

.66 

95B  -  Military  Police 

113 

70.8 

5.8 

.30 

IIB  -  Infantryman 

162 

56.1 

12.3 

.49 

19E  -  Armor  Crewman 

106 

81.1 

11.8 

.56 

31C  -  Single  Channel  Radio  Operator 

140 

80.1 

10.7 

.44 

63B  -  Light  Wheel  Vehicle  Mechanic 

126 

79.8 

8.7 

.49 

91A  -  Medical  Specialist 

159 

83.4 

11.4 

.35 

‘Calculated  as  8-test  score  correlated  with  7- 

test  score 

,  corrected 

to  15 

tests. 


The  extent  of  the  changes  made  on  the  tests,  considering  both  obtained  data 
and  informed  Judgments,  was  small.  Among  common  task  tests,  Judgments  of  hands- 
on  suitability  resulted  in  deleting  five  tests.  Additionally,  In  each  MOS  two  to 
five  M0S-spec1f1c  tasks  were  dropped. 

Proponent  Aoencv  Review 

Following  on  the  adjustment  steps  described  above,  each.  MOS  was  covered 
by  a  set  of  15-17  hands-on  tests,  and  a  set  of  knowledge  items  that  was  60  to 
70  percent  of  the  sec  that  had  been  field  tested.  The  array  of  hands-on  and 
knowledge  tests  for  each  MOS  Is  summarized  In  Table  3-13. 

The  final  step  In  the  development  of  hands-on  and  knowledge  tests  was 
Proponent  agency  review.  This  step  was  consistent  with  the  procedure  of 
obtaining  Inpuc  from  Army  subject  matter  experts  at  each  major  developmental 
stage. 


The  Proponent  was  asked  to  consider  two  questions:  (a)  Do  the  measures 
reflect  doctrine  accurately,  and  (b)  do  the  measures  cover  the  major  aspects 
of  the  Job?  A  Proponent  representative  was  given  copies  of  the  measures; 
staffing  of  the  review  was  left  to  the  discretion  of  the  Proponent  agent. 
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Table  3-13 

r>uninary  of  MOS  Task  Tests  Before  Proponent  Review 


Total 

Hands-On 

Knowledge 
Items  . 

13B 

17 

177/181 

64C 

16 

168 

71L 

15 

148 

95B 

15 

210 

IIB 

15 

198 

19£ 

15 

196 

31C 

15 

215 

63B 

IS 

196 

91A 

15 

CM 

Item  changes  by  Proponents  generally  affected  fewer  than  10  percent  of 
the  Items  within  an  MOS  and  most  such  changes  involved  the  wording,  not  the 
basic  content,  of  the  item.  Changes  affecting  the  task  list  occurred  in  only 
three  MOS. 

In  determining  whether  any  of  these  task  list  changes  constituted  a 
major  shift  in  content  coverage,  special  consideration  was  given  to  the 
principle,  applied  in  the  Initial  task  selection,  that  every  cluster  of  tasks 
be  represented  by  at  least  one  task.  For  MOS  71L  and  MOS  95B,  each  cluster 
was  still  represented  after  the  Proponent  changes  had  been  implemented.  For 
MOS  IIB,  the  deletion  of  Perform  PMCS  on  Tracked  or  Wheeled  Vehicle  and  Drive 
Tracked  or  Wheeled  Vehicle  left  one  cluster,  consisting  of  tasks  associated 
with  vehicle  operation  and  maintenance,  unrepresented.  However,  the  Infantry 
School's  position  was  that  tasks  in  this  cluster  did  not  represent  the  future 
orientation  of  the  IIB  MOS,  so  this  omission  was  considered  acceptable  under 
the  selection  criteria. 

A  second  condition  in  which  strict  adherence  to  Proponent  suggestions 
was  not  necessarily  advisable  was  where  the  suggestions  could  not  be  easily 
reconciled  with  documented  Army  doctrine.  Where  conflict  with  documentation 
emerged,  the  discrepancy  was  pointed  out;  if  the  conflict  was  not  resolved, 
items  were  deleted. 
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Finally,  If  Proponent  contnents  seemed  to  Indirste  a  misunderstanding  of 
the  intended  purpose  or  content  of  test  Items,  clarif Ication  attempted. 
The  basic  aporoach  was  to  continue  discussions  u.itil  some  mutually  agreeable 
solution  could  be  found. 


Task  Performance  Rating  Scales 

Inspection  of  the  task  performance  rating  data  revealed  large  level 
differences  in  the  mean  ratings  provided  by  two  or  more  raters  of  the  same 
soldier,  and  reliabilities  varied  widely  across  the  tasks.  During  the  Batch  A 
field  tests,  it  was  observed  that  supervisors  and  peers,  confronted  with  only 
the  task  title,  might  not  have  been  entirely  clear  on  the  scope  of  tasks  they 
were  rating.  Low  interrater  reliability  supported  this  observation.  Conse¬ 
quently,  for  the  Batch  B  data  collacticn  for  two  MOS  (31C  and  19E),  the  task 
statements  were  augmented  with  the  brief  descriptions  of  the  tasks  that  had 
^  been  developed  for  the  task  clustering  phase  of  development.  Howeveri  this 
modificction  did  not  appear  to  affect  results  from  these  MOS. 

HOS-Sp^sjf It  Sfftjngs 

Fcr  each  MOS,  the  reliability  estimates  computed  for  performance 
dimension  ratings  provided  by  supervisors  were  compared  with  estimates  for 
diniension  ratings  provided  by  peers  to  identify  problem  dimensions.  (See 
Table  3-14  for  a  summary  of  the  median  reliability  estimates  as  well  as  the 
range  of  reliabilities  for  each  MOS.) 


For  most  MOS,  there  appears  to  be  no  consistent  pattern  when  reliability 
estimates  computed  for  supervisor  ratings  are  compared  with  those  computed  for 
peer  ratings.  Within  MOS  95B  one  performance  dimension,  Providing  Security, 
appeared  to  present  problems  for  both  rater  groups.  The  interrater  reliabil¬ 
ity  estimate  computed  separately  for  supervisors  and  peers  was  .39.  There¬ 
fore,  the  definition  as  well  as  the  behavioral  anchors  for  this  particular 
dimension  were  clarified. 

For  the  remaining  MOS-specific  rating  scales,  performance  dimensions 
with  low  reliability  estimates  for  supervisor  or  peer  ratings  were  identified. 
The  rating  scale  definitions  and  anchors  developed  for  these  dimensions  were 
reviewed,  and  revised  if  it  seemed  appropriate.  Since  very  little  leniency  or 
central  tendency  error  was  exhibited,  no  changes  were  made  in  the  scales  as 
the  result  of  these  date. 

Revisions  Based  on  Proponent  Review 

For  one  MOS,  Military  Police  (95B),  the  Proponent  asked  for  more 
extensive  changes.  Incumbents  in  this  MOS  provide  combat  and  combat  support 
functions.  Therefore,  four  performance  dimensions  describing  these  require¬ 
ments  were  added  to  the  MOS-specific  rating  scales:  Navigation  (Oimension  H); 
Avoiding  Enemy  Detection  (Dimension  I);  Use  of  Weapons  snd  Other  Equipment 
(Dimension  J);  and  Courage  and  Proficiency  in  Battle  (Dimension  K).  Defini¬ 
tions  and  behavioral  anchors  for  these  scales  had  been  developeo  for  the 
Infantryman  (IIB)  rating  scales.  Proponent  representatives  reviev.ed  these 
definitions  and  anchors  and  authorized  including  the  same  information  in  the 
Military  Police  performance  rating  scales. 
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Analyses  of  the  field  test  data  from  the  Army-wide  ratina  measures 
focused  on  (a)  distributions  of  the  ratings,  (b)  Interrater  reliabilities,  and 
(c)  intercorrelations  among  the  rating  scale  dimensions. 

Findings  suggest  that  raters  did  not  exhibit  excessive  leniency  or 
central  tendency.  Means  were  Generally  between  4  and  5  on  the  7-point  scale. 
Reliabilities  of  the  individual  behavioral  scales  were  respectable  (.51  •  .68, 
median  ■  .58)  and  composites  of  individual  scales  would  be  higher.  The 
single-scale  Overall  Effectiveness  and  NCO  Potential  reliabilities  were 
likewise  raasonably  high  (median  ■  .66).  Regarding  the  Army-wide  common  task 
ratings,  Interrater  reliabilities  for  the  common  task  scale  interrater 
reliabilities  were  lower  (.33  -  .60,  median  ■  .44).  Supervisor  and  peer 
ratings  had  very  similar  levels  of  interrater  reliability. 

Overall,  the  rating  scale  intercorrelations  were  not  as  high  as  are 
usually  found  and  were  substantially  lower  than  the  individual  scale  reliabi¬ 
lities.  This  is  particularly  significant  because  the  scale  reVabilities 
(i.e.,  the  intraclass  £)  incorporated  rater  differences  as  error  while  the 
scale  Intercorrelations  did  not  (i.e.,  all  correlations  were  based  on  the  same 
set  of  raters). 

As  with  the  MOS-specific  3ARS  scales,  experience  administering  the  Army¬ 
wide  rating  scales  during  Batch  A  indicated  that  some  soldiers  had  difficulty 
with  the  amount  of  reading  required.  In  addition,  a  few  of  the  statements 
anchoring  the  different  effectiveness  levels  appeared  to  be  multidimensional. 

Between  the  Batch  A  and  Batch  B  administrations,  one  of  the  13  common 
task  scales  was  dropped  because  a  13th  scale  would  have  required  an  additional 
page  on  the  printed  version  of  the  scales.  The  task  dimension  that  had  the 
lowest  interrater  reliability  and  seemed  the  most  redundant  with  others  was 
eliminated  for  Batch  B  and  tne  Concurrent  Validation. 

Finally,  after  the  instruments  were  submitted  to  Proponent  review,  the 
Army-wide  effectiveness  dimension  Maintaining  Living/Work  Areas  was  dropped  to 
reduce  the  time  required  to  complete  these  scales.  Experts  judged  that 
dimension  to  be  the  least  important  and  the  most  expendable. 

In  summary,  only  minimal  changes  were  made  to  the  Army-wide  rating 
scales  as  a  result  of  the  field  tests:  first,  eliminating  one  behavioral 
dimension  to  improve  administrative  efficiency;  second,  making  relatively 
minor  wording  changes  and  reducing  the  length  of  the  scale  anchors  to  lessen 
the  reading  difficulty  as  well  as  the  time  required  to  complete  the  scales. 

Combat  Performance  Prediction  Scale 

Forms  A  and  B  of  the  Combat  Performance  Prediction  Scale  were  admini¬ 
stered  at  only  one  post  during  the  Batch  B  field  testing.  The  scale  was 
administered  to  peer  and  supervisor  raters  during  the  rating  sessions,  along 
with  the  Army-wide  and  MOS-specific  rating  scales. 

No  meaningful  differences  were  found  in  means  and  standard  deviations 
between  supervisor  and  peer  raters,  or  combat  and  noncombat  MOS,  or  among  the 
six  scale  dimensions.  All  of  the  means  are  slightly  above  the  scale  midpoint 
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of  7.5.  A  very  low  reliability  of  .21  was  obtained  fo-  the  total  score  on  all 
76  items  when  ratings  were  pooled  across  raters  and  M0:j. 

A  set  of  40  items  was  selected  from  the  pool  of  76  items  on  the  basis  of 
content  domain  (dimension)  coverage  and  psychometric  properties.  Psychometric 
properties  considered  included  reliability,  item-dimension  correlation,  item- 
total  correlation,  and  means  and  standard  deviations  across  MOS  and  rater 
groups.  Responses  to  the  questions  concerning  rating  confidence  and  item 
applicability  were  also  considered. 

Vast  improvement  in  total  score  reliability  (i.e.,  .21  to  .56)  resulted 
when  the  40  best  items  from  among  the  76  were  selected.  Total  scale  coef¬ 
ficient  alpha  remained  at  .94.  The  40-item  scale  was  judged  to  have  suffi¬ 
ciently  good  psychometric  properties  to  justify  its  use  lor  all  MOS  in  the 
Concurrent  Validation  data  collection. 

Administrative/Archival  Indicators 

The  Personnel  File  Information  Form  (a  self-report  of  201  File  informa¬ 
tion)  was  administered  at  every  field  test  site.  Using  the  same  form,  project 
staff  extracted  the  same  information  from  each  soldier^s  201  File,  thus  making 
possible  a  comparison  of  the  two  approaches.  A  total  of  505  cases  were 
available  for  administrative  measures  analyses. 

Self-Report  vs.  File  Data 

For  the  Number  of  Awards  variable,  there  was  perfect  correspondence 
between  the  two  sources.  For  the  other  measures,  which  showed  varying  levels 
of  agreement,  a  greater  percentage  of  soldiers  were  reporting  more  occurrences 
of  administrative  measures  being  received  than  were  found  in  their  201  Fi^es 
(e.g.,  see  Tables  3-15  and  3-16). 

This  situation  was  not  surprising  in  light  of  our  earlier  exploration  of 
201  Files.  According  to  regulations,  not  all  letters,  certificates.  Articles 
15,  etc.  are  placed  in  201  Files,  and  some  documents  are  removed  aUer  a 
certain  period  of  time.  Also,  while  201  Files  are  the  most  timely  official 
source  of  information,  they  are  certainly  not  updated  daily.  Thus,  discrepan¬ 
cies  in  the  reported  direction  were  not  unexpected.  If  soldiers  had  reported 
more  positive  documents,  such  as  letters  and  certificates,  and  fewer  negative 
documents,  such  as  Articles  15,  when  compared  with  the  file  data,  then  the 
self-report  data  would  surely  be  suspect.  However,  soldiers  reported  receiv¬ 
ing  more  negative  as  well  as  more  positive  documents. 

Correlations  were  computed  between  the  six  administrative  measures  and 
Army-wide  supervisor  and  peer  ratings,  respectively.  Relationships  obtained 
from  the  self-report  approach  were  generally  higher  than  those  obtained  from 
201  Files. 

To  further  investigate  why  self-report  differed  from  file  information, 
staff  personnel  conducted  an  outlier  ana Vais  hy  talking  with  individual 
soldiers,  trying  to  determine  the  extent  to  which  they  were  counting  the  items 
that  we  intended  to  be  counted.  If  the  soldier  was  interpreting  the  question 
as  we  intended,  we  then  asked  for  possible  explanations  as  to  why  a  self- 
reported  item  was  not  found  in  the  201  File. 
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Table  i~lS 


Comparison  of  Letters/Certif Icates  Inforaation  Obtained  From 
Se1f-!)eport  and  201  Fllesi  Batch  A 


Self-Reoort 

201  File _ 
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9 

2 

0 

n 

1 

190 
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80 
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3 
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■■ 
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2 

60 

21 

6 

0 

1 

0 

88 

3 

38 

11 

6 

3 

0 

0 

0 

58 

4 

24 

8 

5 

4 

1 
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1 

43 

5 

7 

4 

1 

0 

1 

0 

mm 

13 

6 

S 

1 

0 

1 

1 

0 

8 

7 

Jl 

J. 
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Ji 

_1 

Total 

392 

74 

24 

9 

4 

1 

1 

505 

Table  3-16 

Comparison  of  Articles  15/FLA6  Information  Obtained  From 

Self ''Report  and  201  ^iles:  Batch  A 

_ 201  File., 

Self-Reoort 

1 

__a 

Total 

0 

320 

10 

2 

0 

332 

1 

73 

6 

4 

0 

63 

2 

38 

13 

2 

1 

54 

3 

18 

8 

1 

0 

27 

4 

2 

1 

1 

1 

5 

5 

1 

1 

0 

0 

2 

6 

1 

0 

0 

0 

1 

7 

-4 

4 

4 

Total 

55 

To 

2 

Some  of  the  reasons  confirmed  earlier  suspicions,  such  as  "Counted 
training  certificates,"  “Counted  certificate/letter  that  accompanied  award," 
and  "Recently  received,  paperwork  not  completed."  Other  reasons  were 
unexpected,  such  as  "Counted  Levy  alert"  as  a  FLAS  action;  a  Levy  alert  is  a 
notification  of  an  impending  transfer.  Tho  lesson  iearned  was  a  simple  one: 
For  the  Concurrent  Validation  data  collection  the  self-report  questions  needed 
to  be  more  detailed,  and  even  more  clearly  specified 
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Revisions  for  Concurrent  Validation 

After  the  field  tests  of  the  Personnel  File  Information  Form,  it  was 
concluded  that  selforeport  yields  the  most  timely  and  complete  data.  However, 
a  number  of  revisions  were  made  in  the  self-report.  The  Military  Training 
Courses  variable  was  dropped  from  consideration  because  it  had  little  varUnce 
and  showed  verv  low  relationships  with  other  measures.  Further,  since  the 
earlier  201  File-EMF  comparison  showed  almost  perfect  agreement  for  the 
Proniotion  Rate  and  Reenlistment  Elioibility  variables,  and  since  monthly 
updates  of  the  EMF  have  become  available  and  there  is  no  longer  a  need  to 
collect  this  information  from  the  field,  the  Reenlistment  Eligibility  question 
and  three  questions  used  to  compute  Promotion  Rate  were  dropped  from  the 
Personnel  File  Information  Form.  Finally,  as  mentioned  above,  the  remaining 
questions  were  made  more  detailed. 

Training  Achievement  Tests 

Training  achievement  data  are  for  Batches  A  and  B  only  (nine  MOS). 

These  data  were  collected  both  from  trainees  as  they  completed  their 
respective  course  and  from  Job  incumbents  during  the  Butch  A  and  B  field 
tests.  Trainee  and  field  test  job  incumbents  results  match;  that  is, 
coefficient  alpha  for  both  the  trainee  and  Job  incumbent  samples  was  .88. 

Mean  correct  for  trainees  was  53.9  percent,  compared  to  54.5  percent  for  Job 
incumbents. 

Reduction  in  Number  of  Items  for  Concurrent  Validation 

Because  of  time  constraints,  the  length  for  the  Concurrent  Validation 
versions  of  the  training  tests  would  be  limited  to  approximately  150  items. 

To  reduce  the  size  of  the  item  pool,  any  items  that  nad  been  rated  not 
relevant  to  the  joo  and  also  not  relevant  to  training  were  dropped  first. 

Next,  items  that  had  been  rated  lowest  in  importance  and/or  hignest  in 
difficulty  were  dropped.  Because  the  training  performance  domain  was  assumed 
to  be  multidimensional,  items  were  not  usually  eliminated  solely  because  of  a 
low  correlation  with  the  total  test  score.  However,  some  items  were  dropped 
that  exhibited  the  three  characteristics  of  (a)  low  pass  rate,  (b)  negative 
item-total  correlation,  and  (c)  a  distractor  or  distractors  with  a  high 
positive  item-total  £.  During  the  revision  of  the  item  pools,  the  relative 
frequency  of  items  in  each  Job  task  duty  area  was  maintained. 

The  numbers  of  items  remaining  on  each  test  after  tho  revisions  had  been 
made  are  reported  in  Tables  3-17,  3-18,  and  3-19.  The  versions  to  be  used  for 
the  Concurrent  Validation  contained  the  number  of  items  shown  in  the  columns 
on  the  far  right.  The  tables  for  Batches  A  and  B  differ  slightly  from  the 
table  for  Batch  Z  because  many  of  the  Batch  A  and  B  item  reductions  were  made 
using  field  test  data,  which  were  not  obtained  for  Batch  Z.  Before  being 
administered  to  Job  incumbents  as  part  of  the  Concurrent  Validation,  each  item 
pool  was  submitted  to  the  appropriate  TRADOC  Proponent  for  review.  The  number 
of  items  sent  out  for  review  and  the  number  of  items  eliminated,  added,  or 
modified  as  a  result  of  this  review  are  also  summarized. 

Comparison  of  Initial  and  CV  Itr»m  Pools 

When  initial  item  pool  and  Concurrent  Validation  versions  are  compared, 
there  is  a  small  increase  in  the  percentage  of  items  rated  very  Important  and 
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the  combat  scenario  (very  important,  53. i  to  34.0!»i  0  r  LiLL'e  IiTipOr tanCc ,  22.0 
to  20.6%)  and  the  garrison  scenario  (Very  Important,  43.1  to  46.5%;  Cf 
Littleimportance,  11.2  to  8.3%).  These  changes  are  all  In  the  expected 
direction,  given  the  procedures  that  were  used  to  revise  the  Initial  item 
pools. 


For  the  version  of  the  tests  administered  as  part  of  the  Concurrent 
Validation,  the  distribution  across  relevance  categories  is  nearly  the  same  as 
for  the  original  item  pool. 
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Chapter  4 

THE  CONCURRENT  VALIDATION 


SAMPLES  AND  PROCEDURES 

The  nomenclature  for  MOS  groupings  was  changed  slightly  for  the  Concur¬ 
rent  Validation,  with  previously  designated  Batch  A  and  Batch  B  MOS  becoming 
Batch  A.  The  remaining  10  MOS  were  still  designated  as  Batch  Z,  as  listed  in 
Table  4-1. 


Table  4-1 

NOS  in  the  Concurrent  Validation  Phase  of  Project  A 


Batch  A  MOS 

IIB  Infantryman 

13B  Cannon  Crewman 

19E  Armor  Crewman 

31C  Single  Channel  Radio  Operator 

63B  Light  Wheel  Vehicle  Mechanic 

64C  Motor  Transport  Operator* 

71L  Administrative  Specialist 
91A  Medical  Specialist 
95B  Military  Police 


Batch  2  MOS 

12B  Combat  Engineer 

16S  MANPADS  Crewman 

27E  TOW/Dragon  Repairer 

51B  Carpentry/Masonry  Specialist- 

54E  Chemical  Operations  Specialist 

55B  Ammunition  Specialist 

67N  Utility  Helicopter  Repairer 

76W  Petroleum  Supply  Specialist 

76Y  Unit  Supply  Specialist 

94B  Food  Service  Specialist 


*  In  the  latter  part  of  the  CV  phase,  MOS  64C  became  MOS  88M. 


Collection  of  CV  data  was  planned  to  begin  in  May  1985,  using  procedures 
that  had  been  tried  out  and  refined  during  the  predictor  and  criterion  field 
tests,  and  13  data  collection  sites  in  the  CONUS  and  sites  in  USAREUR.  Data 
collection  actually  began  10  June  1985  and  was  concluded  13  November  1985. 

The  data  were  collected  by  on-site  teams  made  up  of  seven  or  eight  project 
staff  members.  At  the  peak  of  data  collection,  seven  teams  (one  per  post) 
were  operating. 


Samples  Obtained 

The  final  sample  sizes  obtained  are  shown  by  post  and  by  MOS  in  Table 
4-2.  A  target  sample  size  of  600-700  job  incumbents  per  MOS  was  the  overall 
goal,  but  in  some  MOS,  the  sample  was  smaller,  either  because  the  MOS  simply 
is  not  that  large  or  because  not  enough  incumbents  with  the  appropriate 
accession  dates  were  available  at  the  various  sites. 


Blank 
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Predictor  and  Criterion  Measures 

The  full  array  of  predictor  and  criterion  measures  used  in  the  Concur¬ 
rent  Validation  is  described  at  some  length  In  the  FY86  Annual  Report 
(Campbell,  1987)  and  in  the  development  and  field  test  reports  /or  each  major 
type  of  instrument.  The  variables  in  each  domain  are  listed  in  Tables  4-3  and 
4-4.  In  the  Concurrent  Validation  one-half  day  was  devoted  to  predictor 
measurement  and  one  and  one-half  days  to  criterion  measurement. 

While  the  same  predictor  battery  was  used  for  all  the  MOS,  the  criterion 
measures  used  for  Batch  A  MOS  were  different  than  those  used  for  MOS  in  Batch 
Z.  The  major  distinction  is  that  the  MOS-specific  job  performance  and  job 
knowledae  measures  were  not  developed  for  the  10  MOS  in  Batch  Z.  For  these 
jobs  only  Army-wide  measures  and  the  training  achievement  tests  were 
administered. 


Data  Collection  Team  Composition  and  Training 

Each  data  collection  team  wuS  composed  of  a  test  site  manager  and  six  or 
seven  project  staff  members  who  were  responsible  for  administering  tests  and 
rating  scales.  The  teams  were  made  up  of  a  combination  of  regular  project 
staff  and  individuals  (e.g.,  graduate  students)  specifically  hired  for  the 
data  collection  effort,  the  test  site  manager  had  participated  extensively  in 
the  field  tests.  The  team  was  assisted  by  eight  NCO  scorers  (for  the  hands-on 
tests),  one  company-grade  officer  POC,  and  up  to  five  NCO  support  personnel, 
all  provided  by  the  post.  The  project  data  collection  teams  were  given  3  days 
of  training  at  a  central  location  (Alexandria,  VA),  The  eight  NCO  scorers  who 
were  required  to  administer  and  score  the  hands-on  tests  were  recruited  and 
trained  at  each  post,  using  procedures  very  similar  to  those  used  in  the 
criterion  field  tests.  Training  required  one  full  day  during  which  scorers 
had  the  opportunity  to  take  the  tests  themselves  and  undergo  multiple  practice 
trials  in  scoring  each  task,  with  feedback  from  the  project  staff. 

Concurrent  Validation  Analyses 

The  basic  analytic  steps  for  the  Concurrent  Validation  data  were  as 
outlined  below.  The  overall  goal  was  to  move  systematically  from  the  raw 
data,  which  consist  of  thousands  of  elements  of  information  on  each  individ¬ 
ual,  to  estimates  of  selection  validity,  differential  validity,  and  selecti¬ 
on/classification  utility. 

Senferal  Steps 

The  general  steps  in  the  analysis  were  as  follows; 

(1)  Prepare  and  edit  individual  data  files. 

(2)  Determine  basic  scores  for  the  predictor  variables. 

(3)  Determine  basic  scores  for  the  criterion  variables. 

(4)  Describe  the  latent  structure  of  the  predictor  and  criterion 
covariance  matrixes. 
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Tabl«  4-3 

Suimary  of  Predictor  Mtasures  Used  In  Concurrent  Validation: 
The  Trial  Battery 


COGNITIVE  PAPER-AND-PENCIL  TESTS 

Reasoning  Test  (Inductlon-Floural  Reasoning)  30 

Object  Rotation  Test  (Spatial  Visualization-Rotation)  90 

Orientation  Test  (Spatial  Orientation)  24 

Maze  Test  (Spatial  Orientation)  24 

Map  Test  (Spatial  Orientation)  20 

Assenibling  Objects  Test  (Spatial  Visualization-Rotation)  32 

COMPUTER-ADMINISTERED  TESTS 

Simple  Reaction  Time  (Processing  efficiency)  15 

Choice  Reaction  Time  (Processing  efficiency)  30 

Memory  Test  (Short-term  memory)  36 

Target  Tracking  Test  1  (Psychomotor  precision)  18 

Perceptual  Speed  and  Accuracy  Test  (Perceptual  speed  35 

and  accuracy) 

Taroet  Tracking  Test  2  (Two-hand  coordination)  18 

Number  Memory  Test  (Number  Operations)  28 

Cannon  Shoot  Test  (Movement  judgment)  36 

Identification  Test  (Perceptual  speed  and  accuracy)  36 

Target  Shoot  Test  (Psychomotor  precision)  30 

NON-COGNITIVE  PAPER-AND-PENCIL  INVENTORIES 

Assessment  of  Background  and  Life  Experiences  (ABLE)  209 


Adjustment 

Dependability 

Achievement 

Physical  Condition 

Leadership 

Locus  of  Control 

Agreeableness/LIkablllty 

Army  Vocational  Interest  Career  Examination  (AVOICE)  176 

Realistic  Interests 
Conventional  Interests 
Social  Interests 
Enterprising  Interests 
Artistic  Interests 


102 


Tabla  A-4 


Sumary  of  Critorlon  Noasuros  Usod  In  Batch  A  and  Batch  Z 
Concurrent  Validation  Saapita 


ficfgninCT  WHiffH  Cam  it  Btun  .UomttfL  I 

•  rating  tcalM  (all  obtalnad  froa  both  tuparvltort  and  paart). 

•  Tan  bahawlorally  anehortd  rating  tealH  (BARS)  daitgnad  to  Baaaura  faeton 
of  non>job>apaclf1c  parforaanea. 

•  Singla  teaU  rating  of  ovarall  affactlvanati. 

•  Singla  acala  rating  of  MCO  potantlal. 

a  CoiOat  Pradletlon  aeila  containing  40  itaaa. 

a  Rtpar>and-pane1l  taatt  of  training  aehlavaaant  davalopad  for  aach  of  tha  19  MS  (130-210  itaas 
aaeh). 

a  Partonnal  flta  InforMtlon  for*  davalopad  to  gathar  objaetiva  archival  racords  data  (aMrds 
and  lattara,  rifla  aarfcaaanshlp  icorat.  physical  training  teorai.  ate.). 

Parfenatnee  Haaiurai  lor  Batch  A  Only 

a  Job  taiails  (hands-on)  tasts  of  MS-ipae1f1c  task  proficlancy. 

•  individual  is  tastad  on  aaeh  of  IS  aajor  Job  tasks  In  an  MS. 

a  Papar-and-pancll  Job  knowladga  tasts  dailgnao  to  aaatura  task- 
spaclflc  Job  knoMiadga. 

-  Individual  Is  scorad  on  ISO  to  200  aultlpla-chelea  Itaa*  raprasantlng  SO  Mjor  Job  tasks. 

Tan  to  IS  of  tha  tasks  wara  also  aiaturad  hands-on. 

a  Rating  scala  ao^suras  of  spaciflc  task  parfonaanea  on  tha  IS  tasks  also  Mtsurad  with  tha 
knowladga  tasts.  Host  of  tha  ratad  tasks  wara  also  Includad  In  tha  hands-on  aassuras. 

a  MS-spaelf 1c  bahavicrally  anrhorad  rating  scalas  (BARS).  Fro*  tlx  to  12  GARS  war*  davalopad 
for  aach  MS  to  rapratant  tha  aijor  factors  that  conatltuU  Job-tp*c1f1c  tachnical  and  Usk 
proficlancy. 

P.trfQr»tnca  Maaiurat  for  Batch  Z  Only 

a  Additional  Anay*w1d*  rating  scales  (all  obtalnad  fro*  both  tuparvltort  and  paart). 

Ratings  of  porformnea  on  11  coaaon  tasks  (a.g..  basic  first  aid). 

-  Single  tala  rating  on  parfonaanea  of  spaciflc  Job  dutlas. 

Auxiliary  Heaturas  Includad  In  Critorlon  Battarv 

a  A  Job  n^tory  Quostlonnalr*  idilch  asks  ft.’*  Infornatlon  about  fraquancy  and  raancy  of  parforvanca 
of  tha  MS-tpacIfIc  tasks. 

a  Ar%  Work  Envlrotaant  Quastlonnalr*  -  SS  Itans  attatsing  tltuatlonal/anvlromantal 
character Istlu.  plus  46  Itaat  daailng  with  laadarthip. 

a  Nauuraiwit  Hethod  Rating  ohUlnad  fro*  all  participants  at  tha  aM  of  tha  final  tasting  tattlcn. 
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(5)  Determine  how  well  each  predictor  construct  predicts  each  criter¬ 
ion  factor  (for  each  MOS). 

(6)  Determine  incremental  validities  (if  any)  of  new  predictors  over 
ASVA6  for  each  criterion  factor  within  each  MOS. 

HliSiJifi  Yllyii 

Because  extensive  multivariate  analyses  requiring  complete  data  were  to 
be  performed,  the  treatment  of  missing  values  was  an  important  concern  (Young, 
Harris,  Hoffman,  Houston,  t  Wise,  198/).  Cases  with  significant  amounts  of 
missing  data  (10%  for  written  tests,  15%  for  hands-on  tests  and  rating  scales) 
were  dropped  from  the  analysis  of  that  instrument.  In  cases  where  lesser 
amounts  of  data  were  missing,  either  examinee  means  or  variable  means  were 
substituted  for  missing  values.  For  these  data,  the  PROC  IMPUTE  statistical 
procedure  was  used  to  derive  proxy  values  for  missing  scale  scores,  and  for 
missing  step  scores  in  the  hands-on  analyses.  These  procedures  enabled 
retention  of  90-95  percent  of  the  soldiers  in  each  MOS. 

The  PROC  IMPUTE  procedure  essentially  substitutes  for  the  missing 
variable  a  value  observed  for  a  respondent  who  is  very  similar  to  the 
examinee.  This  procedure  has  been  shown  to  be  significantly  better  than 
ordinary  least  squares  (OLS)  regression  procedures  (e.g.,  6MDPAM)  in  reproduc¬ 
ing  correlation  and  variance  estimates,  at  the  regression  approaches  tend  to 
underestimate  variances  and  to  spuriously  inflate  correlations. 

Predictor  Score  Analyses 

After  data  preparation,  basic  item  analyses,  and  the  initial  score 
generation,  the  principal  objectives  for  the  predictor  analyses  were  to 
generate  the  basic  summary  scores  that  would  enter  the  initial  prediction 
equation  for  each  MOS.  The  basic  steps  were  as  follows; 

(1)  Using  the  initial  scores,  conduct  Item/scale  score  analyses. 

(2)  Compute  scale  reliabilities  and  descriptive  statistics. 

(3)  Develop  predictor  construct  scores  via  factor  analysis. 

(4)  Estimate  predictor  factor  (construct)  scores  via  a  simple 
weighted  sum. 

Criterion  Score  Analyses 

After  data  preparation  had  been  completed,  the  objectives  for  the 
criterion  analyses  were  to  identify  an  ar’-ay  of  basic  criterion  variables 
(i.e.,  scores),  investigate  the  latent  structure  of  those  variables,  and 
determine  the  principal  criterion  component  scores. 

•gffrdictQr/CrJterion  Interrelationships 

After  the  above  steps  were  carried  out,  the  basic  variables  and  the 
best-fitting  model  for  both  the  predictors  and  the  performance  measures  had 
been  identified.  They  provided  the  variables  to  be  used  for  establishing  the 
selection/classification  validity  of  the  new  predictor  battery  and  for 
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determining  differential  validity  across  criterion  constructs,  across  jobs, 
and  arrows  subgroups. 


DEVELOPMENT  OF  PREDICTOR  SCORES  AND  COMPOSITES 
Basic  Predictor  Scores  for  the  Trial  Battery 

A  total  of  69  scores  were  generated  from  the  Trial  Battery.  Forty>three 
came  from  the  non>cogn1tive  inventories— Assessment  of  Background  and  Life 
Experiences  (ABLE),  the  Army  Vocational  Interest  Career  Examination  (AVOICE), 
and  the  dob  Orientation  Blank  (dOB),  which  had  been  included  in  the  AVOICE  for 
the  Trial  Battery.  Six  scores  came  from  the  six  paper-and-pencil  cognitive 
tests.  For  the  computer-administered  tests,  a  number  of  alternative  methods 
of  scoring,  such  as  slopes,  intercepts,  and  different  methods  of  computing 
means  (e.g.,  different  procedures  tor  trimming  items  before  computina  means), 
were  evaluated.  Generally  speaking,  the  computerized  test  scores  selected  for 
additional  analyses  were  those  that  were  most  reliable  and  could  be  inter¬ 
preted  in  a  straightforward  way. 

The  Ns,  means,  standard  deviations,  reliabilities,  and  uniqueness  (from 
ASVAB)  coefficients  for  scores  on  the  cognitive  paper-and-pencil  tests  are 
shown  in  Table  4-5.  Similar  data  are  shown  in  Tables  4-6  and  4-7  for  the 
computer-administered  tests,  and  in  Tables  4-8,  4-9,  and  4-10  for  the  ABLE, 
AVOICE,  and  JOB  scale  scores.  Uniqueness  coefficients  are  not  shown  for  these 
instruments,  but  range  from  ;40  to  .88,  with  a  median  u*  of  .79  for  ABLE,  .80 
for  AVOICE,  and  .57  for  JOB. 

In  general,  the  batter v  exhibited  quite  good  psychometric  properties, 
with  the  exception  of  low  reliabilities  on  some  computer-administered  test 
scores.  The  low  reliabilities  tended  to  be  characterivtic  of  the  proportion 
correct  scores,  which  was  expected.  That  is,  the  items  can  almost  always  be 
answered  correctly  if  the  examinee  takes  enough  time,  which  restricts  the 
range  on  the  proportion  correct  scores.  However,  it  increases  the  variance 
(and  reliability)  on  the  decision  time  scores. 

Forma ti OIL  of  Predictor  Composites 

Preliminary  analyses  of  the  Trial  Battery  predictor  tests  indicated  that 
reliable  predictor  scores  could  be  computed  from  the  six  spatial  tests  (i.e., 
the  paper-and-pencil  cognitive  tests),  the  10  computerized  tests,  and  the 
temperament,  vocational  interest,  and  Job  reward  inventories  (Peterson,  et 
al.,  1987).  In  addition,  scores  from  the  nine  ASVAB  subtests  were  available 
from  Army  records.  Table  4-11  shows  how  those  predictor  scores  were  dis¬ 
tributed  among  various  domains  within  the  predictor  space.  The  ASVAB  subtests 
measured  nine  coanitive  abilities.  The  paper-and-pencil  cognitive  tests 
measured  six  different  aspects  of  spatial  ability.  The  10  computerized  tests 
yielded  20  measures  of  perceptual -psychomotor  abilities.  The  ABLE  provided 
measures  of  11  temperament/  biographical  traits.  The  AVOICE  assessed  22 
vocational  interests.  Finally,  the  JOB  measured  six  types  of  job  reward 
preferences . 
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Tab1«  4-5 

Concurrant  Validity  Data  Analyais:  Statlstlea  for  Pap«r-and-Pinci1 
Cognitivo  Tasts 


JL 

Mean 

SSL 

Split- 

Half 

Rell- 

Test- 

Retest 

Rell- 

Uniqueness 

Estimate 

Assembling  Objects 

9,343 

23.3 

6.71 

.91 

.70 

.65 

Object  Rotation 

9,345 

62.4 

19.06 

.99 

.72 

.81 

Maze 

9,344 

16.4 

4.77 

.96 

.70 

.74 

Orientation 

9,341 

11.0 

6.18 

.89 

.70 

.60 

Hap 

9,343 

7.7 

S.51 

.90 

.78 

.46 

Reasoning 

9,332 

19.1 

5.67 

.87 

.65 

.54 

*Sp11t-ha1f  reliability  estimates  were  calculated  using  the  odd-even 
procedure  with  the  Spearman-Brown  correction  for  test  length. 


•Test-retest  reliability  estimates  are  based  on  a  sample  of  468  to  487 
subjects.  The  test-retest  Interval  was  2  weeks. 


Because  of  multicolllnearlty  and  the  ratio  of  number  of  variables  to 
sample  size,  78  separate  predictor  scores  were  too  many  to  retain.  Conse¬ 
quently,  the  78  predictor  test  and  scale  scores  were  combined  Into  24  predic¬ 
tor  composites  before  predictor-criterion  relationships  were  computed.  With 
one  exception  (which  will  be  noted),  these  composites  were  formed  simply  by 
summing  standardized  test  or  scale  scores. 
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4-S 

Concurrtnt  Validity  Data  Analysic:  Statistics  for  CoMputarlzad 
Psychoaotor  Tasts 


l£ii 

JL 

tlSAQ 

Target  Tracking  1 

Mean  Log 
(Distance  1) 

9,2S1 

2.98 

JmlLlafl  2 

Mean  Log 
(Distance  *  1) 

9.239 

3.70 

Mean  Log 
(Distance  ♦  1) 

8,892 

2.17 

Mean  Time 
to  Fire 

Cannon  Shoot 

B.S92 

235.39 

Mean  Absolute 

9,234 

43.94 

Time  Discrepancy 


SSL 

Odd- 

Even 

Rell- 

Test- 

Retest 

Rell- 

Uniqueness 

.49 

.98 

.74 

.82 

.51 

.98 

.85 

.79 

.24 

.74 

.37 

.70 

47.78 

.85 

.58 

.78 

9.57 

.65 

.52 

.56 

*Tiine-to>f ire  and  time-discrepancy  measures  are  in  hundredths  of  seconds. 
Logs  are  natural  logs. 

^est-retest  reliability  estimates  are  based  on  sample  sizes  of  468  to  487. 
The  test-retest  Intorval  was  2  weeks. 
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T«blt  4-7 


Concurrant  Validity  Data  Analysis:  Statistics  for  Conputariztd 
Parcaptual  Tasts 


Odd- 

Even 

Rail- 

Test- 

Retest 

Rell- 

Unidueness 

Test  A  H  for  each 

Simple  Reaction  Time  (SRT) 

Mean 

SS. 

Decision  Time  Mean  9,255 

31.84 

14.82 

.88 

.23 

.87 

Proportion  Correct  9,255 

.98 

.04 

.46 

.02 

.44 

Cti9lM  Reaction  Tlmc  CCRll 

Decision  Time  Mean  9,269 

40.93 

9.77 

.97 

.69 

.93 

Proportion  Correct  9,269 

.98 

>03 

.57 

.23 

.55 

ShflU  ToriiLMoingcy-iSUll 

Decision  Time  Mean  9,149 

87.72 

24.03 

.96 

.66 

.93 

Proportion  Correct  9,149  .89 

Perceptual  Speed  t_ Accuracy  fPSAl 

.08 

.60 

.41 

.55 

Decision  Time  Mean  9,244 

236.91 

63.38 

.94 

.63 

.92 

Proportion  Correct  9,244 

.87 

.08 

.65 

.51 

.61 

Target  Identification  (TIP) 

Decision  Time  Mean  9,105 

193.65 

63.13 

.97 

.78 

.83 

Proportion  Correct  9,105 

.91 

.07 

.62 

.40 

.59 

Number  Memory 

Final  Response  Time  9,099 

Mean 

160.70 

42.63 

.88 

.62 

.67 

Input  Response  Time  9,099 
Mean 

Operations  Response  9,099 
Time  Mean* 

142.84 

55.24 

.95 

.47 

.85 

233.10 

79.72 

.93 

.73 

.66 

Proportion  Correct  9,099 

5RT-CaTz5TH.BSA-TlB 

.90 

.09 

.59 

.53 

.39 

Pooled  Mean  8,962 

Movement  Tine* 

33.61 

8.03 

.74 

.66 

.71 

*T1nas  ara  divan  in  hundradths  of  seconds. 

•N  •  460-479  for  tast-retast  correlations.  The  test-retast  interval  was  2  weeks. 
^Coefficient  Alpha  reliability  estimates. 
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T.ljlj  4_0 

ABLE  Seal*  Statistics  for  Total  Group*;  Trial  Battary 


ABLEJgftli 

Substantive  Scales 

No. 

ixm. 

Emotional  Stability 

17 

8,522 

Self-Esteem 

12 

8,472 

Cooperativeness 

18 

8,494 

Conscientiousness 

15 

8,504 

Nondelinquency 

20 

8,482 

Traditional  Values 

11 

8,461 

Work  Orientation 

19 

8,498 

Internal  Control 

16 

8,485 

Energy  Level 

21 

8,488 

Dominance 

12 

8,477 

Physical  Condition 

6 

8,500 

Response  Validity  Scales 

Unlikely  Virtues 

11 

8,511 

Self-Knowledge 

11 

8,508 

Non-Random  Response 

8 

8,559 

Poor  Impression 

23 

8.492 

Median 

Internal 

Item- 

Cons is- 

Total 

tency 

Corre- 

Relia- 

[jfiAa  ^  Nation 

(Alpha) 


39.0 

5.45 

.39 

.81 

28.4 

3.70 

.39 

.74 

41.9 

5.28 

.39 

.81 

35.1 

4.31 

.34 

.72 

44.2 

5.91 

.36 

.81 

26.6 

3.72 

.36 

.69 

42.9 

6.06 

.41 

.8A 

38.0 

5.11 

.39 

.78 

48.4 

5.97 

.38 

.82 

27.0 

4.28 

.44 

.80 

14.0 

3.04 

.60 

.84 

15.5 

3.04 

.34 

.63 

25.4 

3.33 

.36 

.65 

7.7 

.59 

1.5 

1.85 

.20 

.63 

“Total  group  a^tar  screening  ^or  mUs^ng  data  and  random  responding. 
*N  ■  408-414  for  test-retest  correlation.  Test-retest  interval  was 


Test- 

Retest 

Re11a- 

hmt/. 


.74 

.78 

.76 

.74 

.80 

.74 

.78 

.69 

.78 

.79 

.85 


.63 

.64 

.30 

.61 


weeks. 
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T«bl«  4-9 


AV3XCE  Seal#  Statistics  for  Total  Group*}  Trial  Battery 


AVOICE  Scale 

No. 

Items 

14 

H 

Mean 

-SL- 

Median 

Item- 

Total 

Corre- 

laiion 

Internal 

Consis¬ 

tency 

Relia- 

bility 

Test- 

Retest 

Relia- 

bilitv* 

Clerical/ 

8.463 

39.6 

10.81 

.67 

(Alpha) 

.92 

.78 

Administrative 

Mechanics 

10 

8,382 

32.1 

9.42 

.80 

.94 

.82 

Heavy  Construction 

13 

8,488 

39.3 

10.54 

.68 

.92 

.84 

Electronics 

12 

6,359 

38.4 

10.22 

.70 

.94 

.81 

Combat 

10 

8,466 

26.5 

8.35 

.65 

.90 

.73 

Medical  Services 

12 

8,364 

36.9 

9.54 

.68 

.92 

.78 

Rugged  Individualism 

15 

8,396 

53.3 

11.44 

.56 

.90 

.81 

Leadership/Guidance 

12 

8,444 

40.1 

8.63 

.62 

.69 

.72 

Law  Enforcement 

8 

8,471 

24.7 

7.37 

.65 

.89 

.84 

Food  Service  - 

8 

8,472 

20.2 

6.50 

.67 

.89 

.75 

Professional 

Firearms  Enthusiast 

7 

8,397 

23.0 

6.36 

.66 

.89 

.80 

Science/Chemical 

6 

8,468 

8.493 

16.9 

5.33 

.70 

.85 

.74 

Orafting 

6 

19.4 

4.97 

.66 

.84 

.74 

Audiographics 

5 

8,473 

17.6 

4.09 

.69 

.83 

.75 

Aesthetics 

5 

6,413 

14.2 

4.13 

.59 

.69 

.73 

Data  Processing 

4 

8,224 

14.0 

3.99 

.78 

.90 

.70 

Food  Service  - 

3 

8,304 

5.1 

2.08 

.54 

.73 

.56 

Employee 

Mathematics 

3 

8,421 

9.6 

3.09 

.78 

.88 

.75 

Electronic 

6 

8,403 

18.4 

4.66 

.60 

.83 

.68 

Communications 

Warehousing/Shipping 

2 

8,407 

5.8 

1.75 

.44 

.61 

,54 

Fire  Protection 

2 

8.431 

6.1 

1.96 

.62 

.76 

.67 

Vehicle/Equipment 

3 

6,378 

8.8 

2.65 

.51 

.70 

.68 

Operator 


^total  group  a^ter  screen-lng  for  missing  data  and  random  respond'ing. 

*N  ■  389-409  for  test-retest  correlation.  Test-retest  interval  was  2  weeks. 
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Tabit  4-10 

JOB  Seal*  Statistics  for  Total  Group*:  Trial  Battery 


M 

Ho. 

Items 

H 

Mean 

-SDl 

Median 

Item- 

Total 

Corre- 

Misn 

Internal 

Consis¬ 

tency 

Relia- 

Job  Security 

10 

7,809 

43.6 

4.51 

.54 

.84 

Job  Pride 

5 

7.817 

21.6 

2.33 

.43 

.67 

Serving  Others 

3 

7,784 

12.1 

1.83 

.52 

.66 

Autonomy 

4 

7,817 

IS.l 

2.29 

.31 

.50 

Routine 

4 

7,707 

9.6 

2.30 

.25 

.46 

Ambition 

3 

7,751 

12.4 

1.63 

.35 

.49 

'Total  group  al’ter  screening  /or  m^ss'ing  data  and  random  responding. 


Table  4-11 

Assessment  of  the  Selected  Measures  with  Reference  to  the  Predictor  Space 


Predictor  Domain 

Measures* 

Number 

of 

Test 
or  Scale 
Scores 

Number  of 
Composite 
Scores 

General  Cognitive 
Ability 

Armed  Services  Vocational 
Aptitude  Battery  (ASVAB) 

9  Subtests 

4 

Spatial  Ability 

Spatial  Test  Battery 

6  Tests 

1 

Perceptual- 
Psychomotor  Abilities 

Computerized  Battery 

20  Tests 

6 

Temperament 

Assessment  of  Background 
and  Life  Experiences  (ABLE) 

11  Scales* 

4 

Vocational 

Interests 

Army  Vocational  Interest 

Career  Examination  (AVOICE) 

22  Scales 

6 

Job  Reward 

Preferences 

Job  Orientation  Blank  (JOB) 

6  Scales 

3 

W  measures  except  the  ASVAB  were  dove toped  specifically  lor  Project  A. 
nhe  ABLE  Included  four  additional  response  validity  scales. 
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Thr^f»  goals  guided  the  formation  of  composite  scorts.  ‘^Irst,  there  was 
an  attempt  to  keep’the  number  of  composites  to  a  minimum.  Second,  humugenelty 
within  composites  was  maxlnited.  Third,  even  If  two  or  more  test  or  scale 
scores  were  reasonably  highly  correlated  and  had  similar  patterns  of  factor 
loadings,  they  were  grouped  Into  the  same  composite  only  If  thty  were  expected 
to  have  similar  patterns  of  correlations  with  job  performance. 

Figure  4-1  shows  how  the  nine  ASVAB  subtests  were  combined  Into  four 
composite  scores:  Technical,  Quantitative,  Verbal,  and  Speed.  In  computing 
the  Technical  composite  score,  the  Electronics  Information  subtest  received  a 
weight  of  one-half  unit  while  the  Mechanical  Comprehension  and  Auto-Shop 
subtests  received  unit  weights,  because  a  factor  analysis  Indicated  that  the 
loading  of  the  Electronics  Information  subtest  on  the  Technical  factor  of  the 
ASVAB  was  only  about  one-half  as  large  as  the  loading  of  the  Mechanical 
Comprehension  and  Auto-Shop  subtests. 

The  six  spatial  tests  were  all  highly  Intercorrelated  and  as  Figure  4-2 
shows,  were  combined  Into  a  single  composite  score.  Six  composite  scores  were 
computed  from  the  20  perceptual-psychomotor  test  scores  from  the  computerized 
battery  (Figure  4-3).  Four  temperament  composites  were  computed  from  the  ABLE 
scales  (see  Figure  4-4)  and  six  vocational  Interest  composites  were  computed 
from  the  21  AVOICE  scales  (see  Figure  4-5).  Finally,  the  six  scales  of  the 
JOB  were  combined  Into  three  composites  (Figure  4-6;. 

All  subsequent  predictor  validation  analyses  were  based  on  these  24 
basic  scores.  They  are  portrayed  In  summary  form  In  Table  4-12.  The  tests 
and  Inventory  scales  from  the  Trial  Battery  which  were  used  to  form  simple  sum 
factor  scores  are  listed  under  each  factor  title. 


DEVELOPMENT  OF  BASIC  JOB  PERFORMANCE  CRITERION  SCORES 

During  the  Concurrent  Validation,  Project  A  collected  12  hours  of 
criterion  data  from  5,000  Incumbents  In  nine  MOS  (Batch  A)  and  4  hours  of  data 
from  4,500  incumbents  In  10  MOS  (Batch  Z).  For  each  Individual  In  Batch  A 
there  were  approximately  350  knowledge  test  Items,  15  hands-on  task  scores,  95 
rating  scales  from  each  of  three  raters,  and  6  administrative  Indexes.  The 
first  major  step  In  reducing  these  multiple  bits  of  Information  to  scores  on 
the  major  components  of  performance  was  the  development  of  the  "basic" 
criterion  scores  that  could  be  used  In  covariance  analyses  of  the  latent 
structure.  The  procedures  that  the  project  staff  used  to  obtain  these  basic 
scores  are  summarized  below. 

Criterion  Scores  for  the  Hands-On  and  Knowledge.  Tests 

To  reduce  the  number  of  criterion  scores  derived  from  the  hands-on  tests 
and  Job  knowledge  tests,  the  task  domains  for  each  of  the  nine  Batch  A  MOS 
were  reviewed  by  project  staff  and  tasks  were  clustered  Into  a  set  of  func¬ 
tional  categories  on  the  basis  of  task  content.  Ten  of  the  categories  applied 
to  all  MOS  and  consisted  primarily  of  common  tasks.  In  addition,  each  MOS, 
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/gambling 


Map 

Maaaa 

Objact  Rotatlofi 

Oriantatlon 

Flgural  Raaaoning 


Spatial 


Rlgura  4-2.  Formation  of  apatial  ability  compoaHo  from  opatial 
battary  taat  acoraa. 
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NOTtt  On*  oemputor  tait  Mort.  Cholot  flcMtton  Tim*  (Dvelslon  Time  Minus  Simple  ReaeSen 
Ten#),  wee  net  ueed  in  eemputlng  eempeeHe  eceree. 

IHgurt  4-5.  Fomnstlon  of  poreoptusl-psyehomotor  ebllity  eompoeites  from 
eomputorixod  bsttory  tost  leoros. 


Work  OrlMtatlon 
Entrgy  Lavtl 


Achttv«m»nt  Orientation 


I 


J 


NOTI:  Pour  ABLi  smIm  (Oemlninea,  TradNionai  VMaa,  Coepafattvanaaa.  and  Inlamal  Control) 
taara  not  uaad  In  computing  eompooHa  acorao. 


Ptgurt  4^.  Foimation  of  tompoiamant  compoailoo  from  ABLE  ac*«la  acoraa. 
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Flgur*  4*8.  Formation  of  vocational  intoroat  compoaHoa  from 
AVOICE  aeala  aeoraa. 
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Hgurt  4-0.  Formation  of  Job  rtward  proforoneo  eompotllot  from 
JOB  scalo  tcoros. 

I 
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TAbU  4-12 


Ability,  Toiip«r«Mnt,  and  Xntarast  Factors  Idontlflod  via  Analysis  of  tha  Concurrant 
Validation  Data  on  9,430  NOS  Xncuabants 


raOH  ASVAfl  StlBTESTS 

T«chn1cal  Ftctor 

Ntelunleal  Caipratwmlon 
Auta-Shop  Informtlon 
Clactronkt  Infomtlon 
Qutntititivt  Fictor 
Hath  KnoMltdya 
Arlthwtte  ftaitonlfo 
Vtrtel  Factor 
Varbat 

Ganaral  Sciatica 
Spaad  Factor 
Coding  Spaad 
Niabar  Orations 

FROM  P;^ER.A»10.PEHCtl  TtSTS 

Overall  Spatial  Factor 
AataaOilng  Objacti  Tait 
Hap  Test 

Mata  Tait 

Ob^ct  Rotation  Tait 
Orlantatlon  Tait 
Flgural  Rmening  Tait 


FROM  WOM-COGIllTIVE  llWtllTORlES 

Achlavaaant  Factor 
Salf'Catoaa  leala 

Work  Orlantatlon  tea la 
Energy  Laval  icala 

Oapandablllty  Factor 
Coniclantlouinats  icala 
Non*dal1nquancy  icala 

Adjuitmnt  Factor 

Cntlonal  Stability  icala 

Phytleal  Condition  Factor 
Piiyilcal  Condition  icala 

Ski  Had  Technical  Intamit  Factor 
Clarlcal/Adnlnlitratlva 
Hr^lcal  Sarvlcai 
Laadanhlp/Guldanca 
Sclanca/Chanlcal 
Data  Procatling 
Hathanitlct 

Electronic  Coaninicatloni 


FROM  COMPUTERIZED  MEASURES 


Piychonotor  Factor 
Cannon  Shoot  Tait 
Target  Shoot  Tait 
Target  Shoot  Tait 
Target  Tracking  1 
Targtt  Tracking  ! 
oolad 


Pooled  Naan  Novawnt 


Vila  leora) 
Tlaa  to  flral 
Log  dlitanca) 
L'ag  dlitanca) 
Log  dlitanca) 
It  Tim 


ParcaptMl  Spaad  Factor 

Short-Tarn  Menory  Teat  (Oaelilon  tlM] 

Perceptual  Spaad  I  Accuracy  Tait  (Dad t Ion  tiM) 
Targat  Mantlfleatton  Tait  (Daclilon  tiM) 

Perceptual  Accuracy  Factor 

Short-Tarn  Hanory  Tait  (Pareant  correct) 

Parcaptufl  Spaad  I  Accuracy  Tait  (Percent  correct) 
Target  Identification  Tait  (Percent  correct) 


Structural /Machinal  Intarait  Factor 
Nachanici 

Heavy  Conitructlon 
Electron  i«,i 

Vahlcla/cgulpnent  Operator 

Coobat-Ralatad  Intarait  Factor 
Contiat 

Ruggad  Individual lin 
Firaami  Enthuilait 

Audlovlauil  Arti  Intarait  Factor 
Drafting 
Audlographici 
Aaithatlci 

Food  Service  Intarait  Factor 
Food  Service  Profatilonal 
Food  Sarvico  Enployaa 


Hunbar  Spaad/Accuraey  Factor 


Nuibor  Naaory  Tact 
Nuabar  Haanry  Tait 
Nuabar  Mawry  Teat 
Nuabar  Naaory  Tait 


Pareant  correct} 
Initial  daclilon  tiM) 
Naan  operation!  tiM) 
Final  daclilon  tiM) 


Slapla  Raaetlon  Spaad  Factor 
Choice  Reaction  TIm  (Daclilon  tiM) 
Slapla  Raaetlon  TIm  (Daclilon  tlM) 


Protactlvo  Sarvluai  Intarait  Factor 
Law  EnforciMnt 
Fire  Prritvctlon 

Prafaronca  for  Organizational  and  Co-workar  Support 
Job  Pride 
Job  Security 
Serving  Othon 
AaDitlon 


Slapla  Reaction  Accuracy  Factor 

Choica  Reaction  TIm  (Percent  correct) 
Slapla  Reaction  TIm  (Pareant  correct) 


Prafaranea  for  Routine  Work 
Routine 

Prafaronca  for  Job  Autonopy 
Autonoay 
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except  for  IIB  (Infantrynian)  and  64C  (Hotor  Transport  Oparator),  had  two  to 
five  MOS-speciflc  categories.  The  ten  common  categories  were  sufficient  to 
account  for  all  tasks  in  IIB  and  64C. 

After  category  definitions  had  been  written,  three  members  of  the 
project  staff  independently  classified  the  30  tasks  in  each  MOS  into  one  of 
the  ten  common  categories  or  into  an  MOS-specific  category.  .  The  level  of 
perfect  agreement  in  the  assignment  of  tasks  to  categories  was  over  90  percent 
in  every  HOS.  These  same  functional  categories  were  used  by  the  project  staff 
to  sort  the  school  knowledge  test  Items.  The  titles  of  the  functional 
category  definitions  are  presented  in  Figure  4-7. 

Scores  for  the  functional  categories  were  computed  by  taking  the  sum  of 
the  hands-on  task  test  steps  Adjusted  for  length)  or  Job  knowledge  test  items 
in  each  category. 

Separate  principal  components  analyses  were  then  carried  out  for  each 
HOS,  using  the  functional  category  score  intercorrelation  matrix  as  the  input. 
The  results  of  factor  analyses  performed  in  each  of  the  nine  MOS  suggested  a 
similar  set  of  category  clusters,  with  minor  differences,  across  all  nine  MOS. 
The  ten  functional  categories  that  cut  across  MOS  and  the  several  technical 
functional  categories  that  were  unique  to  particular  MCS  were  reduced  to  six 
basic  scores: 

(1)  Communications  •  including  the  Coninunications  functional  category. 

(2)  Vehicles  -  including  the  Vehicle  Operation  functional  category, 
and  for  MOS  63B  only  the  Vehicle  Operation  and  Recovery  category; 
for  MOS  64C,  the  Vehicle  Operation  functional  category  went  into 
the  Technical  cluster. 

(3)  Basic  Soldiering  -  including  the  Navigate,  Weapons,  Field 
Techniques,  Customs  and  Laws,  and  Anti-Air/Tank  Weapons 
categories. 

(4)  Identify  Targets  -  including  the  Identify  Targets  functional 
category. 

(5)  Safety/ Survival  -  including  the  First  Aid  and  NBC  functional 
categories. 

(6)  Technical  -  including  the  functional  categories  peculiar  to  each 
MOS,  comprising  (usually)  MOS-specific  tasks;  for  MOS  64C,  this 
cluster  included  the  Vehicle  Operation  category,  which  comprises 
tasks  central  to  the  64C  Job. 

Although  this  set  of  clusters  was  not  reproduced  precisely  for  every  one 
of  the  MOS,  it  appeared  to  be  a  reasonable  portrayal  of  the  nine  Jobs  when  a 
common  set  of  clusters  was  imposed  or.  all.  Tables  4-13  and  4-14  show  the 
range  of  correlations  among  the  clusters  and  between  the  categories  and  the 
clusters,  across  the  nine  MOS. 
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First  Aid 
NBC 


Wespons 

Navigste 

Fiald  Techniques 
Customs  and  Laws 


Comnunlcations 


Identify  Targets 
Antl-Air/Tank  Weapons 
Vehicle  Operation 


p6  -  Cannon  Crewman 
Prepare,  Operate,  Maintain 
Howitzer  and  Anmunltlon 
Operate  Howitzer  Sights  and 
Alignment  Devices 

19L-  ,TfcnK4r<wiMn 
operate  Tanks 
Tank  Gunnery 


Generators 

TTY  Station  and  Net  Operations 
Maintain  TTY  Electronic  Equipment 
Operate  TTY  Electronic  Equipment 
Install  TTY  Electronic  Equipment 

63B  -  Light  Wheel  Vehicle  Mechanic 
Electrical  System 
Fuel/Cool In^/Lubrlcatlng 
Brake/Steenng/Suspension  Systems 
Vehicle  Operation  and  Recovery 


Forms/Files  Management 
Superv 1 s 1 on/Coor d 1 nation 
Correspondence 
Classified  Material 


Cllnic/Ward  Treatment  and  Cars 
Clinic/Ward  Housekeeping 
CUnlc/Ward  Management 


Responding  to  Alarms 
Patrol  Duties 
Conduct  MP  Procedures 


Figure  4-7.  Functional  task  categories. 
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T«b1«  4-13 


CorrtUtlons  Batwttn  Crlttrlon  Ftcior  Scorts  and  Functional  Catagorlts  for  Job 
Knculadga  Co&pcr.snt 


ConiwQ.  Vohicio  iAiii  Identify  Survival  Technical 


FACTORS 

CooBunlcatlons 

Vehicles 

Basic 

Identify  Tgts. 

Survival 

Technical 

FVNCIIfltiAL-CAIEfiQRm 
Communications 
Vehicle  Ops. 

Havigate 
Field  Tech. 

Weapons 

Anti  Air/Tank  Hpns. 
Customs  I  Laws 
Identify  Tgts. 

First  Aid 
NBC 

Technical)  13B 
19E 
31C 
63B 
64C 
71L 
91A 
95B 


15-48 

21-56 

13-42 

12-65 

Qin 

21-28 

21-2B 

12-45 

06-30 

09-46 

04-27 

12-41 

10-39 

14 

• 

13-33 

11-30 

09-21 

12-15 

09-35 

12-25 

15-51 

11-41 

18-21 

«• 

36 

•  ' 

34-49 

14-35 

• 

• 

35-62 

• 

01-13 

20-31 

06-20 

17-51 

09-21 

09-48 

12-15 

65-79 

12-32 

36-93 

08-39 

67-35 

04-35 

32 

20 

56-67 

03-20 

10-42 

ra 

31-55 

06-26 

41-62 

05-26 

47-56 

18-24 

52-55 

28-29 

32-57 

13-29 

37-56 

• 

55 

11 

29-43 

. 

20-55 

-03-19 

33-53 

12-17 

15-50 

21-58 

22-28 

20-35 

25-57 

31-48 

13-63 

24-55 

37-62 

34-59 

26 

• 

31-47 

36-44 

07-32 

11-33 

63-98 

30-73 

78-89 

39-61 

42-51 

75-97 

47-48 

80-88 

38-51 

65-81 

29-44 

62-91 

50 

100 

26-39 

53-88 

42-76 

45-98 

28-46 

63-85 

usift  The  numbers  shown  are  the  range  of  correlations  that  resulted  for 
Individual  HOS;  under  the  Technical  functional  cateoory,  however, 
the  range  of  correlations  Is  shown  across  the  Individual  MOS  Technical 
functional  categories.  Decimals  have  been  omitted  In  the  correlations. 
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Ttbla  4-14 

CerrtUtlons  Batwatn  Crlturlon  Factor  Scorts  and  Functional  Cattgorlas  for 
Handt-On  Coaponant 


Communication*  |T00  I  10-29  05-26  02-20  07-30 


Vehicle  Ops. 
Havigate 
Field  Tech. 

Weapon* 

Anti  Air/Tank  Wpns. 
Customs  &  Law* 

F1r*t  Aid 
NBC 

Technical t  13B 
19E 
31C 
63B 
64C 
71L 
91A 
95B 
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1  100 
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04-11 

07-13 
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07-15  11-16 
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05-40 

06-22 

26-42  12-16 

16-19  16-23 

12-26  00-18 

06-07  01-05 

12  11 

10-20  10-11 

01-23  00-32 

17  12 
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07-37 


-02 
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04-22 


HSlii  The  numbers  shown  are  the  range  of  correlations  that  resulted  for 
Individual  MOSj  under  the  Technical  functional  category,  however, 
the  range  of  correlation*  Is  shown  across  the  Individual  MOS  Technical 
functional  categories,  Oeclmals  have  been  omitted  In  the  correlations. 
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Trdlnlno  Test  Scores 


Criterion  scores  for  the  training  knowledge  tests  were  derived  In  the 
same  way  as  for  the  job  knowledge  tests.  The  results  of  the  expert  judgments 
and  the  exploratory  factor  analyses  suggested  that  the  six-score  solution  was 
also  a  reasonable  one.  Consequently,  in  the  subsequent  analyses  aimed  at 
developing  a  comprehensive  model  of  job  performance,  the  six  content  cate¬ 
gories  were  scored  In  each  of  the  three  tests  (hands-on,  job  knowledge,  school 
knowledge)  In  each  MOS  In  Batch  A. 

Basic  Scores  From  the  Rating  Scales 

For  each  soldier  ratee  In  the  samplo,  the  goal  was  to  obtain  ratings 
from  two  supervisors  and  four  peers  who  had  worked  with  the  ratee  for  at  least 
two  months  and/or  were  sufficiently  fatikiilar  with  the  ratee 's  job  performance. 
The  specific  procedures  used  to  Identity  peer  and  supervisor  ratees  can  be 
found  In  Pulakos  and  Borman  (1986).  (verall,  there  were  an 
average  of  3.1  peer  and  1.9  supervisor  ratings  for  each  ratee.  The  number  of 
raters  per  ratee  was  sufficient  to  allow  reasonable  estimates  of  interrater 
reliability. 

Raters  did  not  succumb  to  excessive  central  tendency  or  leniency.  The 
mean  ratings  were  between  4  and  S  on  the  7-po1nt  scales  and  the  standard 
deviations  were  generally  over  1.00. 

Interrater  Reliability 

Interrater  reliabilities  were  estimated  with  the  Intraclass  correlation 
coefficient.  In  general,  reliabilities  of  the  Individual  scales  were  in  the 
.30  •  .45  range,  and  the  reliabilities  of  the  sums  of  the  Army-wide  and  HOS- 
speclfic  respectively  were  .65  and  .55  using  supervisor  ratings.  For  peer 
ratings,  the  mean  reliabilities  were  .58  and  .42. 

Factor  Analysts  of  the  Rating  Scales 

The  reduction  of  the  individual  rating  scales  to  a  smaller  set  of 
aggregated  scores  was  accomplished  largely  by  means  of  exploratory  factor 
analysis. 

Army-Hide  Ryformance  Rating  Scales.  Principal  factor  analyses  with  a 
varlmax  rotation  for  the  Army-wide  scales  were  performed  across  MOS  for  peer 
raters,  for  supervisor  raters,  and  for  the  combined  peer  and  supervisor  rater 
groups.  Virtually  Identical  results  were  obtained  for  all  three  rater  groups, 
and  a  three-factor  solution  was  chosen  at  the  most  meaningful.  The  names  of 
the  factors  and  the  rating  dimensions  loading  highest  on  each  factor  are  shown 
In  Table  4-15.  Loadings  for  the  rotated  factor  solutions  and  the  combined 
group  are  shown  In  Table  4-16. 

To  determine  how  well  the  factor  solution  would  hold  up  within  individ¬ 
ual  MOS,  factor  scores  using  the  factor  scoring  matrixes  generated  from  the 
analytes  across  MOS  were  computed  within  t.he  peer  rater  group,  within  the 
supervisor  rater  group,  and  for  the  combined. peer  and  supervisor  rater  group. 


124 


Then,  correlations  were  computed  between  the  factor  scores  and  the  original 
behavioral  dimension  ratings.  These  analyses  generally  supported  the  stabil¬ 
ity  and  appropriateness  of  the  three-factor  structure  across  rating  source  and 
MOS. 


Table  4-15 

Army-Vlicle  PerforMnce  Rating  Scales  Factors 


Factor  1:  Job-Relevant  Skills  and  Motivation 
Technical  Knowledge/Skill 
Leadership 
Effort 

Self-Development 
Maintaining  Equipment 

Factor  2:  Personal  Discipline 

Following  Regulations 

Self-Control 

Integrity 

Factor  3:  Physical  Fitness  and  Military  Bearing 
Military  Appearance 
Physical  Fitness 


Table  4-16 

Amv-Wide  Perf  )nMnce  Rating  Scales  Three-Factor  Solution  for 
Combined  Peer  nnd  Supervisor  Raters 


Rotated  r-ictor  Pattern* 


Factor  1 

Factor  3 

.71 

.28 

.30 

Ai 

.69 

.30 

.37 

E: 

.69 

.43 

.26 

B: 

.57 

.38 

.38 

It 

.54 

.34 

.35 

Ft 

.41 

.69 

.30 

C: 

.22 

.63 

.20 

J: 

.50 

.59 

.28 

0: 

.32 

.32 

.57 

6: 

.21 

.15 

.49 

H: 

aiffismlam 

Technical  Skill 

Leadership 

Effort 

Self -Development 
Maintaining  Equipment 
Following  Regulations 
Self-Control 
Integrity 

Military  Appearance 
Physical  Fitness 


‘'Factor  1  -  Job-Relevant  SklUs  and  Motivation;  Factor  2  -  Personal 
Discipline;  Factor  3  -  Physical  Fitness  and  Military  Bearing 
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MOS-Soeciflc  Performance  Patino  Scales.  For  the  MOS-specIfIc  scales, 
principal  factor  analyses  wUh  t  variiiiax  rotation  were  conducted  within  MOS 
and  separately  for  the  peer  and  supervisor  raters.  The  objective  was  to  look 
for  common  themes  that  might  be  evident  across  MOS,  even  though  different 
dimensions  comprised  each  of  the  nine  sets  of  scales. 

Inspection  of  the  factor  analyses  revealed  a  two-factor  solution  that 
could  be  used  for  all  nine  MOS.  The  rating  dimensions  loading  highest  on  one 
of  the  factors  consisted  mainly  of  core  Job  requirements  and  tasks,  while 
those  loading  highest  on  the  second  factor  were  more  peripheral  job  duties. 
Accordingly,  for  all  MOS,  a  two-factor  solution  was  chosen  to  represent  the 
MOS-specIfic  aspect  of  the  criterion  domain,  with  the  factors  named  as 
follows:  Core  Responsibilities,  and  Other  Responsibilities. 

Combat  Effectiveness  Patinos.  The  combat  scales  were  Army-wide  summated 
scales  based  on  the  40  Items  that  survived  the  field  tests  and  were  designed 
to  evaluate  performance  under  degraded  conditions  and  the  increased  confusion, 
workload,  and  uncertainty  of  a  combat  environment.  A  factor  analysis  of  these 
items  based  on  the  combined  samples  from  the  Concurrent  Validation  suggested 
that  two  factors  could  be  extracted.  The  first  factor  contained  Items  that 
seemed  to  reflect  performance  under  adverse,  difficult,  or  dangerous  condi¬ 
tions.  The  second  was  composed  largely  of  items  dealing  with  making  mistakes, 
getting  into  trouble,  or  creating  ducipllne  problems.  Consequently,  items 
within  each  factor  were  summed  to  produce  two  scores  for  expected  combat 
effectiveness:  Performing  Under  Adverse  Conditions  and  Avoiding  Mistakes. 

,,  Army-Wide  Common  Task  Retinas.  Thedi$tributionalpropert1es,reliabi- 
lilies,  and  factor  structure  of  the  11  common  task  rating  scales  were  analyzed 
using  the  same  procedure  as  for  the  Army-wide  performance  scales.  In  general, 
these  scales  showed  greater  central  tendency,  lower  reliabilities,  and  a  less 
clear  factor  structure.  Consequently,  they  were  not  used  in  the  final 
^•riterion  scoring. 

iujgnfia 

To  suflinarize  the  results  of  the  rating  scale  score  analyses: 

e  A  three-factor  solution  (Job-Relevant  Skills  and  Motivation, 

Personal  Discipline,  and  Physical  Fitness  and  Military 
Bearing)  was  chosen  as  the  most  psychologically  meaningful 
for  the  Army-wide  performance  rating  scales. 

e  Factor  analyses  of  the  MOS-specifIc  rating  scales  yielded  a 
two-factor  solution  across  all  nine  MOS  (Core  Responsibili¬ 
ties,  and  Other  Responsibilities). 

e  Factor  analysis  of  the  combat  rating  scales,  using  the 
combined  sample,  also  produced  a  two-factor  solution 
(Performing  Under  Adverse  Conditions  and  Avoiding  Mistakes). 
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MODELING  OF  CRITERION  PERFORMANCE  AND  DEVELOPMENT  OF 
CRiTERION  FACTOR  SCORES 

Adding  all  the  basic  criterion  scores  i;tto  a  single  composite  was  viewed 
as  too  atheoretical,  and  developing  a  reliable  and  homogeneous  measure  of  the 
oeneral  factor  violated  the  basic  notion  that  performance  is  multidimensional, 
A  more  formal  wav  to  model  performance  is  to  think  in  terms  of  its  latent 
structure,  postulate  what  that  might  be,  and  then  resort  to  a  confirmatory 
analysis. 

Before  ary  of  the  CV  data  were  analyzed,  the  best  speculation  of  the 
Project  A  staff  had  produced  a  preliminary  model,  shown  in  Figure  4-8.  It 
went  beyond  what  the  Concurrent  Validation  data  could  examine  and  is  included 
here  only  to  illustrate  the  first  stage  in  an  almost. continuous  process  of 
bootstrapping  toward  a  more  final  conceptual  description  of  the  predictor/ 
criterion  space. 

Successive  revisions  of  the  target  model  were  then  subjected  to  what 
might  be  described  as  "quasi*  confirmatory  analysis,  using  data  from  the 
Concurrent  Validation  sample.  The  purpose  was  to  consider  whether  a  single 
model  of  the  latent  structure  of  Job  performance  would  fit  the  data  from  all 
nine  Jobs.  The  analyses  supporting  this  effort  are  summarized  below. 

Ecggedura 

The  results  of  the  first  level  of  agoregation  have  been  referred  to  as 
the  "basic"  array  of  criterion  scores.  This  reduced  array  of  criterion 
variables  is  shown  in  Table  4-17.  Because  MOS  do  differ  in  their  task  con¬ 
tent,  not  all  31  variables  were  scored  in  each  MOS  and  there  was  some  slight 
variation  in  the  number  of  'variables  used  in  the  subsequent  analyses. 


Table  4-17 

Thirty-One  Basic  Criterion  Scores  Obtained  by  Aggregating  Individual  Rating 
Seales,  Job  Sample  Tasks,  Knowledge  Test  Items,  and  Archival  Records 

1.  Single  scale  rating  of  overaYi  performance. 

Three-Unit  Weighted  Factor  Scores  Obtained  from  the  10  Factor  Analysis  Army- 
Wide  Behaviorally  Anchored  Rating  Scales. 

2.  Effort  and  leadership  factor. 

3.  Personal  discipline  factor. 

4.  Physical  fitness  and  military  bearing  factor. 

Two-Unit  Weighted  Factor  Scores  Obtained  Via  Factor  Analysis  of  the  Job- 
Specific  Behaviorally  Anchored  Rating  Scales  Developed  for  Each  Job. 

5.  Core  responsibilities  factor. 

6.  Peripheral  responsibilities  factor. 

(Continued) 
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TabU  4>17  (Contlr.uad) 

Thlrty-Ont  Basic  Criterion  Scores  Obtained  by  Aggregating  Individual  Rating 
Scales,  Job  Saaiple  Tasks,  Knowledge  Test  Items,  and  Archival  Records 


TwO'Unlt  Weighted  Factor  Scores  Outlined  from  the  Expected  Combat  Performance 
Sunaated  Rating  Scale. 

7.  Performing  well  under  adverse  conditions  factor. 

8.  Avoiding  mistakes  factor. 

Arehival/Admlnistratlve  Performance  Indicators. 

9.  Awards  and  certificates. 

10.  Physical  readiness  test  score. 

11. M16  qualification  score. 

12.  Articles  15/flag  actions. 

13.  Promotion  rate  deviation  score. 

Task  Proficiency  Scale  Scores  Obtained  by  Clustering  Items  for  Hands-On  Job 
Sample  Tests  (HO). 

14.  Core  technical  (MOS-specific). 

15.  Communications. 

16.  Vehicle  operation  and  maintenance. 

17.  General  soldiering. 

18.  Identifying  target  and  threat  vehicles  and  aircraft. 

19.  Safety  and  survival. 

Job  Knowledge  Scale  Scores  Obtained  by  Clustering  Items  From  Job  Knowledge 
Tests  (JK). 

20.  Core  technical  (MOS-specific). 

21.  Communications. 

22.  Vehicle  operation  and  main. 

23.  General  soldiering. 

24.  Identifying  target  and  threat  vehicles  and  aircraft. 

25.  Safety  and  survival. 

Training  Knowledge  Scale  Scores  Obtained  by  Clustering  Items  From  Training 
School  Knowledge  Tests  (SK). 

26.  Core  technical  (HOS- specific). 

27.  Communications. 

28.  Vehicle  operation  and  maintenance. 

29.  General  soldiering. 

30.  Identifying  target  and  threat  vehicles  and  aircraft. 

31.  Safety  and  survival. 
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A  Revised  Model  of  Job  Performance 


Construction  of  a  Target  Model 

The  next  step  was  to  build  a  revised  target  model  of  job  performance 
that  could  be  tested  for  goodness-of-fit  within  each  of  the  nine  Jobs,  using 
the  CV  data.  To  do  this,  the  intercorrelatlon  matrixes  of  the  basic  criterion 
scores  for  the  nine  MOS  were  each  subjected  to  another  round  of  empirical 
factor  analysis  to  suggest  possible  modifications. 

Several  consistent  results  were  observed.  First,  as  expected,  there  was 
the  general  prominence  of  “methods"  factors,  specifically  one  methods  factor 
for  the  ratings  and  one  methods  factor  for  the  written  tests.  Secondly,  there 
was  a  close  correspondence  between  the  administrative  measures  scales  and  the 
three  Army-wide  rating  factors.  The  awards  and  certificates  scale  from  the 
administrative  measures  loaded  together  with  the  Army-wide  Effort/Leadership 
rating  factor;  the  Articles  15  score  and  the  promotion  rate  scale  loaded  with 
the  Personal  Discipline  factor. 

Based  on  such  findings,  a  revised  model  was  constructed  to  account  for 
the  correlations  among  performance  measures.  It  included  five  Job  performance 
constructs  which  are  defined  in  Figure  4-9. 

An  issue  that  remained  was  whether  the  Job-specific  BARS  were  measuring 
Job-specific  technical  knowledge  and  skill,  or  effort  and  leadership,  or  both. 
For  purposes  of  model  fitting  the  MOS-specific  BARS  core  factor  was  hypo¬ 
thesized  to  load  on  both  Core  Technical  and  Effort/Leadership. 

Another  issue  was  whether  it  was  necessary  to  posit  hands-on  and  admini¬ 
strative  measures  "methods"  factors  to  account  for  tne  inter-correlations 
within  each  of  these  sets  of  measures.  Since  the  average  intercorrelation 
among  the  scores  within  each  of  these  sets  was  not  particularly  high,  the 
hypothesized  model  did  not  include  these  two  additional  methods.  However,  it 
did  include  the  ratinos  and  written  test  methods  factors.  Consequently,  the 
complete  model  specified  the  following  seven  factors: 

1.  Core  Technical  Proficiency 

2.  General  Soldiering  Proficiency 

3.  Effort  and  Leadership 

A.  Personal  Discipline 

5.  Physical  Fitness  and  Military  Bearing 

6.  Ratings  method  factor 

7.  Paper-and-pencil  method  factor 
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A.  Cgntf  JKhnKfll.Pr^fitlsncy 

This  performance  construct  represents  the  proficiency  with  which  the  soldier 
performs  the  tasks  tiiat  are  "central*  to  the  HOS.  The  tasks  represent  the 
core  of  the  Job  and  they  are  the  primary  definers  of  the  HCS.  For  example, 
the  first-tour  armor  Crewman  starts  and  stops  the  tank  engines ;  loads  and 
unloads  the  main  gun;  boresights  the  H60A3;  engages  targets  with  the  main  gun; 
and  performs  misfire  procedures.  This  performance  construct  does  not  Include 
the  Individual's  willingness  to  perform  the  task  or  the  degree  to  which  the 
Individual  can  coordinate  efforts  with  others.  It  refers  to  how  well  the 
Individual  can  execute  the  core  technical  tasks  the  Job  requires,  given  a 
willingness  to  do  so. 

2.  General  Soldiering  Proficiency 

In  addition  to  the  core  technical  content  specific  to  an  HOS,  Individuals  In 
every  MOS  also  are  responsible  for  being  able  to  perform  a  variety  of  general 
soldiering  tasks— for  example,  determines  grid  coordinates  on  military  maps; 
puts  on.  wears,  and  removes  M17  protective  mask  with  hood;  detcmines  a 
magnetic  azimuth  using  a  compass;  and  recognizes  and  Identifies  friendly  and 
threat  aircraft.  Pertormance  on  this  construct  represents  overall  riroficlency 
on  these  general  solulerinu  tasks.  Again,  It  refers  to  how  well  the  Individ¬ 
ual  can  execute  general  soldiering  tasks,  given  a  willingness  to  do  so. 

3.  Effort  and  Leadership 

This  performance  construct  reflects  the  degree  to  which  the  Individual  exerts 
effort  over  the  full  range  of  Job  tasks,  perseveres  under  adverse  or  dangerous 
condition?  and  demonstrates  leadership  and  support  toward  peers.  That  Is, 
can  the  Individual  be  counted  on  to  carry  out  assigned  tasks,  even  under 
adverse  conditions,  to  exercise  good  Judgment,  and  to  be  generally  dependable 
and  proficient:  While  appropriate  knowledges  and  skills  are  necessary  for 
successful  performance,  this  construct  Is  meant  only  to  reflect  the  Individ¬ 
ual's  willingness  to  do  the  Job  required  and  to  be  cooperative  and  supportive 
with  other  soldiers. 

4.  p.gngpa! 

This  performance  construct  reflects  the  degree  to  which  the  Individual  adheres 
to  Army  regulations  and  traditions,  exercises  personal  self-control,  demon¬ 
strates  Integrity  In  dey-tc-day  behavior,  and  does  not  create  disciplinary 
problems.  People  who  rank  high  on  this  construct  show  a  commitment  to  high 
standards  of  personal  conduct. 

5.  £hyiiC9l  Fitness  and  Military  Searing 

This  performance  construct  represents  the  degree  to  which  the  Individual 
maintains  an  appropriate  military  appearance  and  bearing  and  stays  In  good 
physical  condition. 


Figure  4-9.  Definitions  of  the  Job  Performence  Constructs. 


131 


Confirmation  of  the  Model  Within  Each  Job 

The  next  step  in  the  analysis  was  to  conduct  separate  tests  of  goodness- 
of'flt  of  this  target  model  within  each  of  the  nine  Jobs.  This  was  done  using 
the  LISREL  confirmatory  factor  analysis  program  (Joreskog  &  Sorbom,  1961). 

As  Is  not  uncommon  when  using  confirmatory  models,  some  problems  were 
encountered  In  fitting  the  hypothesized  model  to  several  of  the  Jobs.  Some 
factor  loadings  were  greater  than  one,  with  negative  uniqueness  estimates  for 
the  corresponding  observed  variables.  Also,  estimates  of  the  correlations 
among  the  performance  constructs  occasionally  exceeded  unity.  These  problems 
necessitated  a  certain  amount  of  ad  hoc  cutting  and  fitting  in  the  form  of 
computing  the  squared  multiple  correlation  (SMp  for  predicting  each  observed 
variable  from  all  of  the  other  variables,  anoTetting  the  uniqueness  estimates 
(I.e.,  Theta>Eps11on  diagonal)  to  1.0  minus  this  SSUL-  This  approach 
eliminated  all  factor  loadings  and  correlations  greater  than  uoa.  In  most 
cases,  a  second  "Iteration*  was  perforn«d  to  adjist  the  initial  uniqueness 
estimates  (Theta-Epsilon)  so  that  the  diagonal  of  the  estimated  correlation 
matrix  would  be  as  close  to  1.0  as  possible.  The  final  factor  loading 
estimates  for  each  Job  are  shown  In  Table  4-18. 

LlSREl.  r.lso  computes  a  goodness-of-f1t  Index  based  on  a  comparison  of 
the  actual  correlatlors  among  the  observed  varlablev  and  the  estimated  cor* 
relations.  The  goodness-of-f1t  is  distributed  as  chi-square,  with  degrees  of 
freedom  dependent  on  the  number  of  observed  variables  and  the  number  of 
parameters  estimattd.  The  expected  value  of  chl-squara  Is  equal  to  the 
degrees  of  freedom;  it  Is  a  sign  that  the  model  does  not  fit  the  correlations 
among  the  observed  variables. 

However,  the  chi-square  values  should  be  Interpreted  with  caution 
because  the  hypothesized  target  model  was  based  In  part  on  analyses  of  these 
same  data.  In  addition,  LISREL  was  “told”  that  the  Theta-Epsilon  (uniqueness) 
parameters  all  were  fixed,  and  therefore  did  not  "use  up"  degrees  of  freedom 
estimating  these  parameters;  in  fact,  these  values  were  estimated  entirely 
from  the  data. 


Confirmation  of  an  Overall  Model 

The  results  of  the  confirmatory  procedures  applied  to  the  performance 
measures  from  each  Job  gene /ally  supported  a  common  structure  of  Job  perfor¬ 
mance.  A  final  step  was  to  determine  whether  the  variation  in  some  of  these 
parameters  across  Jobs  could  be  attributed  to  sampling  variation  by  hypothe¬ 
sizing  that  (a)  the  correlation  among  factors  was  invariant  across  Jobs,  and 
(b)  the  loadings  of  all  of  the  Army-wide  measures  on  the  performance  con¬ 
structs  and  on  the  rating  method  factor  were  also  constant  across  Jobs. 
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UbU  4-18 

Fftctor  Lotdings:  S*p«ratt  Modal  of  Job  Porfoiwtnco  for  Each  Job 


MOS 


Construct/Factor* 

IIB 

in 

19F 

31C 

635 

64C 

71L 

91A 

95B 

Core  Technical 

HO  Technical 

.61 

.47 

.64 

.51 

.29 

.77 

.59 

.32 

JK  Technical 

mm 

.75 

.78 

.79 

.74 

.26 

.78 

.75 

.32 

SK  Technical 

mm 

.70 

.79 

.73 

.82 

.55 

.22 

.81 

.43 

MOS  Toch  Rating 

mm 

.45 

.10 

.22 

.25 

.25 

.34 

.10 

.13 

General  Soldiering 

HO  Soldier 

.60 

.51 

.46 

.64 

.17 

.50 

.50 

.42 

.60 

HO  Safety 

.26 

.33 

.32 

.31 

.12 

.63 

.37 

.48 

.4'/ 

HO  Communications 

.05 

.06 

.39 

.56 

.. 

m  m 

.80 

HO  Vehicle 

•  m 

•  • 

•  • 

.2? 

.17 

b 

m  m 

•  • 

.31 

JF  Soldier 

.76 

.52 

.74 

.62 

.45 

.48 

.87 

.58 

.46 

JK  Safety 

.55 

,37 

.75 

.38 

.71 

.51 

.72 

.58 

.33 

JK  Communications 

.30 

.23 

.65 

.38 

•  • 

m  m 

•  • 

.29 

JK  Vehicle 

•  m 

.17 

«»• 

.10 

.41 

k 

mm 

•  • 

.35 

JK  Identify 

.46 

mm 

.20 

.28 

.12 

mm 

.24 

.21 

SK  Soldier 

.73 

.45 

.67 

.39 

.78 

.56 

.45 

.44 

.42 

SK  Safety 

.47 

.32 

.53 

.62 

.57 

.47 

.30 

.64 

.32 

SK  Communications 

.42 

.26 

.42 

m  m 

.41 

.35 

.20 

.20 

SK  Vehicle 

.22 

.24 

.05 

.30 

.61 

k 

.22 

.47 

.28 

m  m 

SK  Identify 

.46 

— 

.46 

.13 

— 

mm 

— 

Effort/Leadership 

Eff/Ldr  Rating 

.76 

.36 

.85 

.64 

.68 

.83 

.66 

.76 

.70 

MOS  Tech  Ratings 

.70 

•  w 

.63 

.40 

.41 

.25 

.59 

.52 

MOS  Other  Rating 

.77 

.41 

.48 

.43 

.54 

.62 

.43 

.61 

.56 

Combat  Exmpiry 

Combat  Proolems 

,30 

,47 

.60 

.54 

.57 

.87 

.63 

.80 

.77 

.48 

.20 

.39 

.52 

.53 

.55 

.56 

Awards /Cert 1 f 1 cate 

.32 

.23 

.24 

.19 

.28 

.25 

.34 

.34 

.22 

Overall  Rating 

.46 

.39 

.33 

.17 

.57 

.42 

.65 

.41 

Discipline 

Discipline  Rating 

.77 

.58 

.73 

.45 

.63 

.85 

.74 

.58 

.73 

Combat  Problems 

.23 

.16 

.62 

.03 

.05 

.19 

.02 

.33 

Articles  lb 

-.63 

-.61 

-.55 

-.62 

-.65 

-.47 

-.69 

-.46 

-.50 

Promotion  Rate 

.74 

.61 

.68 

.79 

.63 

.57 

.59 

.54 

.54 

Overall  Rating 

.33 

.20 

.53 

.54 

.09 

.42 

.06 

.75 

.38 

(Continuad) 
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T*bU  4>18  (Continutd) 

Fictor  LQ*d1ngp;  Sapirtto  Nod«1  of  Job  PcrfonMnct  for  Each  Job 


MOS 

Construct/Factor' 

IIB 

138 

19E 

31C 

638 

64C 

71L 

91A 

95B 

Fitness/Bearlng 

Fitness  Rating 

.69 

.23 

.84 

.48 

.54 

.42 

.50 

.60 

.78 

Physical  Readiness 

.11 

.90 

.49 

.89 

.70 

.53 

.76 

.69 

.69 

Ratings  Method 

AW  Ratings 

.60 

.73 

.47 

.70 

.66 

.54 

.65 

.66 

.66 

MOS  Ratings 

.73 

.73 

.60 

.69 

.67 

.49 

.69 

.54 

.63 

Combat  Ratings 

.47 

.65 

.55 

.69 

.57 

.27 

.55 

.47 

.40 

Written  Method 

JK  Technical 

•  M 

.47 

.28 

.55 

.59 

.73 

.44 

.58 

.57 

JK  Soldier 

.41 

.51 

.33 

.40 

.61 

.57 

.11 

.37 

.59 

JK  Safety 

.37 

.52 

.12 

.63 

.08 

.49 

.17 

.76 

.57 

JK  Communications 

.34 

.11 

.07 

.55 

•  • 

*  m 

.52 

JK  Vehicle 

mm 

•  • 

«»a» 

.42 

.62 

1 

mm 

.24 

.21 

JK  Identify 

-.15 

.23 

.50 

.36 

•  m 

.05 

mm 

.08 

.23 

SK  Technical 

mm 

.48 

.48 

.55 

.46 

.88 

.42 

.27 

.50 

SK  Soldier 

.50 

.66 

.54 

.59 

.15 

.51 

.54 

••• 

.54 

SK  Safety 

.53 

.55 

.42 

.29 

.34 

.48 

.44 

.19 

.60 

SK  Communications 

.51 

.47 

.46 

mm 

.16 

.24 

.05 

•  • 

.42 

, 

SK  Vehicle 

.49 

.57 

.24 

.48  55  ‘  .38  .05 

.42 

SK  Identify 

.21 

mm 

.42 

.44 

m  m 

m  m 

— 

m  m 

m  m 

M16  Qualification 

.71 

7'* 

•  /  A 

.71 

.71 

.71 

.71 

.71 

.71 

.71 

*H0  •  Hands-on;  JK  •  Job  Knowledge;  SK  ■  School  Knowledge;  AW  •  Army-Wide. 
‘Vehicle  content  was  merged  into  the  Core  Technical  factor  for  MOS  64C. 


The  proposed  overel'i  node'!  was  a  relatively  stringent  test  of  a  coninon 
latent  structure  since  it  was  quite  possible  that  selectivity  differences  In 
the  different  jobs  would  tend  to  make  It  appear  that  the  different  Jobs 
require  different  perfomence  node Is,  when  In  fact  they  do  not.  However,  the 
over-all  nodel  fit  very  well.  The  root  aean  square  residual  was  .047,  and 
chi-square  was  2508.1  with  2403  decrees  of  freedon  after  adjusting  for  missing 
variables  and  the  use  of  the  data  in  estimating  uniqueness.  Table  4-19  shows 
the  final  nipping  of  the  criterion  measures  on  the  five  performance  com¬ 
ponents. 


Obtalnlno  Criterion  Factors  Scores  for  Individuals 

To  obtain  an  Individual's  score  on  each  of  the  five  constructs,  the 
variables  composing  each  factor  were  scored  and  combined  ln  the  following 
manner. 

The  Core  Technical  Proficiency  construct  Is  operationally  defined  as  the 
standardized  sum  of  the  H0S-sp«c1f1c  technical  task  content  from  the  hands-on 
tests,  the  Job  knowledge  tests,  and  the  school  knowledge  tests. 

The  General  Soldferino  Proficiency  score  Is  also  composed  of  two  major 
components,  each  of  which  Is  standardized  and  then  added  to  generate  the 
criterion  score.  The  first  component  Is  operationally  defined  as  the  sum  of 
the  CVBIS*  scores  from  the  hands-on  test,  and  the  second  component  Is  defined 
as  the  sum  of  the  CVBIS  scores  from  both  the  Joe  knowledge  and  school 
knowledge  tests. 

The  Effort/Leadershlc  criterion  factor  Is  composed  of  four  major 
components,  each  of  which  Is  standardized  before  the  four  are  summed.  The 
first  component  corresponds  to  the  single  rating  for  Overall  Effect Ivenesn. 

The  second  component  Is  composed  of  three  subcomponents.  The  first  Is  onu  of 
the  three  factor  scores  derived  from  the  Army-wide  BARS,  scales  (i.e.,  the 
Army-wide  Effort/Lsadership  factor)  and  consists  of  the  unit-welghtsd  sum  of 
five  different  scales  (Technical  Skill;  Effort;  Leadershli,;  Maintain  Equlis- 
mant;  Self  OevelcpnenlJ.  The  second  and  third  subcomponents  are  the  two 
factor  scores  derived  from  the  MOS-specIfIc  BARS  rating  scales.  (It  should  be 
noted  that  all  raxing  scores  used  In  the  computation  of  all  criterion  con¬ 
structs  are  the  average  of  xhe  ratings  provided  by  supervisors  and  peers.) 

The  third  component  1$  the  average  of  the  two  combat  rating  scales.  Finally, 
the  fourth  compenent  corresponds  to  the  administrative  measure  identified  as 
Total  Awards/Letters. 


'A  set  of  content  categories  derived  from  the  hands-on  and  knowledge  test 
variables,  where  tasks  and  items  were  assigned  as  follows:  Communication  (radio 
operation);  Vehicle  Maintenance;  Basic  Soldiering  Skills  (field  techniques, 
weapons,  navigation,  customs  and  law);  Identify  (friendly  and  enemy  aircraft  and 
vehlclts);  Technical  Skills  (specific  to  the  Job);  Safety/Wvival  (first  aid, 

NBC  I  e 
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The  Personal  Discipline  factor  is  composed  of  two  major  components,  each 
of  which  is  standardized  before  the  two  are  added.  The  first  component  is  the 
Personal  Discipline  score  derived  from  Army>wide  BARS  and  consists  of  the 
unit-weighted  sum  of  three  different  scales  (Following  Regulations;  Integrity; 
Self-Control).  The  second  component  is  the  sum  of  two  administrative 
measures,  Articles  15/Flag  Actions  and  Promotion  Rate  Deviation  score. 

The  fifth  criterion  factor,  Physical  Fitnys  and  Military  Bearing,  is 
composed  of  two  components;  again,  each  it  standardized  before  they  are  added 
to  generate  a  criterion  score.  The  first  component  is  the  Physical  Fitness 
and  Bearing  score  derived  from  the  Armv-wide  BARS  and  consists  of  the  unit- 
weighted  sum  of  two  different  scales  (Military  Appearance;  Physical  Fitness). 
The  second  component  corresponds  to  the  administrative  measure  identified  as 
the  Physical  Readiness  score. 

Five  residual  scores  were  then  created  from  the  five  criterion  factors 
by  partial ing  the  paper-and-pencil  methods  factor  from  Core  Technical  and 
General  Soldiering  and  the  ratings  methods  factor  from  Effort/Leadership, 
Personal  Discipline,  and  Fitness  and  Bearing. 

Cr,UtEtttn.JntttccarrBlat1gni 

The  five  criterion  factor  scores,  the  five  criterion  residual  scores, 
the  single  rating  obtained  from  the  overall  performance  rating  scales,  and  the 
total  score  from  the  hands-on  tests  were  used  to  generate  a  12  x  12  matrix  of 
criterion  intarcorrelations  for  each  MOS  in  Batch  A.  The  averages  of  these 
correlations  across  MOS  are  shown  in  Table  4-20,  The  inter-  correlations 
between  factor  scores  within  method  (factor  1  with  2  or  3  with  4)  are  higher, 
as  expected,  than  factor  pairs  which  do  not  confound  method  (e.g.,  1  with  3  or 
2  with  4).  However,  they  are  not  so  high  that  collapsing  the  five  factors 
into  some  smaller  number  would  be  justified.  In  fact,  factors  1  and  2,  which 
intercorrelate  .53  on  the  average,  yield  different  profiles  of  correlations 
with  the  tests  in  the  predictor  battery. 

Assuming  a  reliability  of  about  .60  for  each  measure  would  yield  an 
intercorrelation  of  about  ,34  for  the  correlation  of  the  overall  performance 
rating  with  the  total  hands-on  score  when  corrected  for  attenuation.  A 
reasonable  conclusion  is  that  while  performance  on  a  standardized  job  sample 
is  a  significant  component  of  performance,  it  is  by  no  means  all  of  it. 

The  correlations  of  the  residual ized  factor  3  (Effort/Leadership 
residual)  with  the  Core  Technical  factor,  the  Core  Technical  residual,  the 
Genera]  Soldiering  Proficiency  factor,  the  overall  rating  scale,  and  the 
hands-on  total  score  all  are  about  the  same.  Also,  as  compared  to  the 
correlation  of  the  Effort/Leadership  raw  scores  with  these  same  variables,  the 
correlations  of  the  Effort/Leadership  residual  with  the  Core  Technical  and 
General  Soldiering  Proficiency  factors  go  up  while  the  correlations  with 
Personal  Discipline  and  Physical  Fitness  go  down.  Residualizing  factor  3  (by 
removing  the  ratings  method  factor)  makes  it  more  like  a  "can  do"  factor  and 
less  like  a  "will  do"  factor. 
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ConcludlnQ  Ccnwients 

In  general,  these  Intercorrelatlons  seem  to  behave  In  very  lawful  ways 
and  are  consistent  with  a  multidimensional  model  of  performance.  In  spite  of 
tome  confounding  of  factor  content  with  measurement  method,  the  latent 
performance  structure  appears  to  be  composed  of  very  distinct  components  and 
It  Is  reasonable  to  expect  that  the  different  performance  constructs  would  be 
predicted  by  different  things.  Since  (a)  the  five-factor  solution  Is  stable 
across  Jobs  sampled  from  this  population,  (b)  the  performance  constructs  seem 
to  make  sense,  and  (c)  the  constructs  are  based  on  measures  carefully  devel¬ 
oped  to  be  content  valid.  It  seemed  safe  to  ascribe  some  degree  of  construct 
validity  to  them. 


BASIC  CONCURRENT  VALIDATION  RESULTS 

As  described  previously,  24  scores  were  used  to  assess  the  predictor 
domain  and  five  criterion  construct  scores  were  developed  to  provide  a 
comprehensive  assessment  of  Job  performance.  Consequently,  the  basic  valida¬ 
tion  data  generated  by  the  Concurrent  Validation  are  contained  In  the  24  x  5 
correlation  matrix  that  could  be  computed  for  each  MOS  In  the  sample. 

The  predictor  scores  were  grouped  Into  six  domains  and  the  multiple 
correlation  of  the  predictor  scores  within  each  domain  with  each  of  the 
criterion  construct  scores  was  computed  for  each  of  the  nine  MOS  in  Batch  A. 
Figure  4-10  depicts  the  relationships  that  were  expected  between  the  predictor 
domains  and  the  five  Job  performance  constructs,  uch  £  was  corrected  for 
range  restriction  using  the  multivariate  procedure  described  In  Lord  and 
Novick  (1966)  and  adjusted  for  shrinkage  using  the  procedure  described  by 
Claudy  (1978). 


inltial_Mulljo1e  Correlation  Results 

Given  six  predictor  domains  and  five  Job  performance  constructs,  30 
multiple  correlations  were  generated  for  each  MOS.  The  mean  validity  {£) 
values  for  the  nine  MOS  are  reported  In  Table  4-21. 

As  a  test  of  the  hypothesized  predictor-criterion  relationships  pre¬ 
sented  In  Figure  4-10,  the  predictor  composites  were  grouped  Into  the  two 
prescribed  sets.  For  each  set  the  £  was  computed  witn  each  of  the  five  Job 
performance  constructs  within  each  of  the  nine  Jobs.  Mean  Rs  from  these 
analyses  are  presented  In  Table  4-22.  The  pattern  of  correlations  Is  very 
similar  to  that  predicted  In  Figure  4-10.  The  one  surprising  result  Is  the 
high  correlation  between  the  non-cognitive  predictors  and  the  two  "can  do" 
performance  constructs.  This  Is  due  primarily  to  tha  validity  of  the  AVOICE, 
which  has  Important  Implications  for  the  development  of  optimal  classification 
algorithms. 
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PRISICTOR  DOMAIN 


CRITERION  CONSTRUCT 


CognNIv*  RocSon 


Oon*ral  Cognitive  AbOliy 
SptSal  Abmty 

Po^eptuol-Psyehomotor  Ability 


Nen-CegnISvo  Portion 


Tofflpommonl 
Vecotionol  Intorooto 
Job  Reword  Preferenoee 


Core  Technical 
Proflciency 

General  Soldiering 
Profleienoy 


Effort  and  Leaderehip 


Pertonal  Dlacipllne 

Phyaieal  ntneaa  and 
Military  Bearing 


PIgure  4-10.  Hypotheaized  predictor-afterion  reiationahipa. 
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T«b1«  4-21 


Kein  Validlt/  for  tht  Composlt*  Scorat  Within  Each  Prodlctor  Dooaln 
Across  Nino  Ar^y  Enllstod  Jobs 


Predictor  Domofn 


Job 

Performance 

Construct 

General 

Cognitive 

Ability 

(K-4)* 

Spatial 

Ability 

(K-1) 

Perceptual - 
Psychomotor 
Ability 
(K-6) 

Temper¬ 

ament 

(K-4) 

Vocational 

Interests 

(K-6) 

“ToB” 

Reward 

Prefer 

(K-3) 

Core  Technical 
Proficiency 

.63 

.56 

.53 

.25 

.35 

.29 

General  Soldiering 
Proficiency 

.65 

.63 

.57 

.25 

.34 

.30 

Effort  and 
Leadership 

.31 

.25 

.26 

.33 

.24 

.19 

Persona! 

Discipline 

.16 

.12 

.12 

.32 

.13 

.11 

Physical  Fitness  .20 

and  Military  Bearing 

.10 

.11 

.37 

.12 

.11 

’■VaH'Idlty  cosfficionts  were  corrected  for  range  restriction  and  adjusted  for 
shrinkage. 

*K  Is  the  number  of  predictor  scores. 


Table  4-22 

Mean  Valldit/  for  the  Cognitive,  Hon-Cognitive,  and  All  Predictor 
Composites  Across  Nine  Army  Enlisted  Jobs 


Predictor  Domain 


Job  Performance  Construct  (K 

anitive 

Non-Cognitive 

(K-nl 

“aTT - 

(K-24) 

Core  Technical  Proficiency 

.65 

.44 

.67 

General  Soldiering  Proficiency 

.69 

.44 

.70 

Effort  and  Leadership 

.32 

.38 

.44 

Personal  Discipline 

.17 

.35 

.37 

Physical  Fitness  and 

.23 

.38 

.42 

Military  Bearing 

•Validity  coefficients  were  corrected  for 
shrinkage. 

V  is  the  number  of  predictor  scores. 

range 

restriction  and  adjusted  for 
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Incremental  Validity 

An  Important  question  is  how  to  improve  upon  the  validity  of  decisions 
made  using  the  current  selection  and  classification  instrument.  The  validity  of 
the  General  Cognitive  Abilitv  scores  (computed  from  the  ASVAB)  was  compared  to 
the  validity  ootained  when  the  scores  from  other  predictor  domains  were  added. 
The  resulting  mean  validities  are  reported  in  Table  4-23. 

Table  4-23 

Mean  Incremental  Validit/'*  for  the  Composite  Scores  Within  Each 
Predictor  Domain  Across  Nine  Army  Enlisted  Jobs 


Predictor  Domain 


General  General 

General  Cognitive  General  Coonitive 

Cognitive  Aoility  General  Cognitive  Ability 

Ability  Plus  Cognitive  Ability  Plus 

Job  General  Plus  Perceptual  Ability  Plus  Job 

Performance  Cognitive  Spatial  Psychorootor  Plus  Vocational  Reward 

Construct  Ability  Ability  Ability  Temperament  Interests  Pref 

• _ (K«4r  (K«5)  (K»10f  (K»8)  (K"10)  (K»7) 

Core  Technical . 


Proficiency 

.63 

.65 

.64 

.63 

.64 

.63 

General  Soldier¬ 
ing  Proficiency 

.66 

.68 

.67 

.66 

.66 

.66 

Effort  and 
Leadership 

.31 

.32 

.32 

.42 

.35 

.33 

Personal 

Discipline 

.16 

.17 

.17 

.35 

.19 

.19 

Physical  Fitness 
and  Hilitary 
Bearing 

.20 

.22 

.22 

.41 

.24 

.22 

‘Validity  coefficients  were  corrected  for  range  restriction  and  adjusted  for 
shrinkage. 

‘Incremental  validity  refers  to  the  increase  in  ft  afforded  by  the  new  predictors 
above  and  beyond  the  ft  for  the  Army's  current  predictor  battery,  the  ASVAB. 

%  is  the  number  of  predictor  scores. 
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None  of  the  predictor  domains  added  more  than  .03  to  the  prediction  of 
Core  Technical  Proficiency  or  General  Soldiering  Proficiency.  In  both 
Instances,  the  composite  that  added  the  incremental  validity  was  Spatial 
Ability.  However,  the  four  Temperament  predictor  scores  added  .11  to  the 

Predicting  of  Effort  and  Leadership,  .19  to  Personal  Discipline,  and  .21  to 
hysical  Fitness  and  Military  Bearing. 

Overall,  the  results  are  consistent  with  the  hypotheses  that:  (a) 
cognitive  ability  would  he  the  most  valid  predictor  of  Core  Technical 
Proficiency  and  General  Soldiering  Proficiency;  (b)  non-cognitive  composites 
would  be  the  most  valid  predictors  of  Personal  Discipline  and  Physical  Fitness 
and  Military  Bearing;  and  (c)  both  cognitive  and  non>cogn1t1ve  predictors 
would  be  useful  for  predicting  Effort  and  Leadership. 

Predictor  Relationships  With  Criterion  Residual  Scores 

Another  method  of  studying  the  construct  validity  of  both  predictors  and 
criteria  Is  to  examine  how  the  pattern  of  pred1ctor*cr Iter Ion  relationships 
chanoes  when  the  variance  attributable  to  the  methods  factors  Is  removed  from 
the  five  performance  construct  scores.  These  results  are  presented  In  Table 
4-24. 


To  compute  residual  performance  construct  scores,  the  variance  attribu¬ 
table  to  the  written  test  factor  was  partlaled  from  the  scores  for  Core 
Technical  Proficiency  and  General  Soldiering  Proficiency,  and  the  variance 
attributable  to  the  rating  factor  was  partlaled  from  the  scores  for  Effort  and 
Leadership,  Personal  Discipline,  and  Physical  Fitness  and  Military  Bearing. 

The  table  shows  that  the  residual  scores  for  Core  Technical  Proficiency 
and  General  Soldiering  Proficiency  were  less  predictable  than  the  raw  scores. 
However,  the  level  of  prediction  Is  still  substantial  even  when  All  variance 
attributable  to  the  paper-and-pencll  measurement  mode  is  partialeoTout.  One 
strong  conclusion  Is  that  measurement  method  does  not  explain  away  the 
validity  of  ASVAB. 

For  Effort  and  Leadership,  the  cognitive  predictor  scores  predicted  the 
residual  performance  construct  scores  better  than  they  predicted  the  raw 

Eerformance  construct  scores.  For  example,  the  mean  £  of  the  General 
ognitive  Ability  composite  rose  from  .31  to  .46.  The  Increase  was  .16  for 
Spatial  composite  and  .12  for  the  Perceptual -Psychomotor  composite.  For  the 
ABLE  composite,  the  results  were  reversed  and  the  multiple  correlation 
decreased  from  .33  to  .31.  The  Vocational  Interests  composite  and  the  Job 
Reward  Preferences  composite  "behaved"  similarly  to  the  Cognitive  Ability 
composite.  The  mean  fi,s  were  greater  for  the  residual  Effort  and  Leadership 
score  than  for  the  raw  Effort  and  Leadership  score. 
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Ttblt  4>24 


Naan  Validity^  for  tha  CooposUa  Scores  Within  Each  Pradlctor  Douin 
Across  Nina  Ar%  Enllstad  Jobs 


Predictor  Domain 


General  Perceptual-  Job 


Job 

Performance 

Construct 

Type 

of 

Score 

Cognitive 

Ability 

K-4)‘ 

Spatial 

Ability 

(K-1) 

Psychomotor 

Ability 

(K-6) 

Temper¬ 

ament 

(K-4) 

Voc 

Inter 

(K-6) 

Reward 

Pref 

(K-1) 

Core 

Technical 

Raw 

.63 

.56 

.53 

.26 

.35 

.29 

Proficiency 

Resid 

.47 

.37 

.37 

.22 

.28 

.21 

General 

Soldiering 

Raw 

.65 

.63 

.57 

.25 

.34 

.30 

Proficiency 

Resid 

.49 

.48 

.41 

.21 

.26 

.22 

Effort 

and 

Raw 

.31 

.25 

.26 

.33 

.24 

.19 

Leadership 

Resid 

.46 

.41 

.38 

.31 

.32 

.27 

Personal 

Discipline 

Raw 

.16 

.12 

.12 

.32 

.13 

.11 

Resid 

.19 

.15 

.13 

.28 

.15 

.10 

Physical 
fitness  and 

Raw 

.20 

.10 

.11 

.37 

.12 

.11 

Military 

Bearing 

Resid 

.21 

.11 

.14 

.35 

.14 

.10 

‘Validity  coefficients  were  corrected  for  range  restriction  and  adjusted  for 
shrinkage. 

‘K  Is  tha  number  of  predictor  scores. 


144 


This  pattern  of  correlations  for  Effort  and  Leadership  suggests  two 
interesting  conclusions.  First,  it  provides  additional  evidence  that  the 
Vocational  Interests  scores  are  more  similar  to  cognitive  predictors  than  to 
temperament  predictors.  Second,  the  changes  In  correlations  suggest  that 
Effort  and  Leadership  becomes  more  tike  a  "can  do"  performance  construct  when 
the  rating  method  factor  is  partialed  out.  However,  the  residual  Effort  and 
Leadership  score  continues  to  reflect  the  "wilt  do"  portion  of  the  Job 
performance  space  as  suggested  by  its  highest  Rs.  Thus,  the  residual  Effort 
and  Leadership  score  appears  to  tap  both  "can  do"  or  maximal  Job  performance 
and  "will  do"  or  typical  job  performance. 

Partial ing  the  rating  factor  from  the  Personal  Discipline  and  the 
Physical  Fitness  and  Military  Beuring  scores  had  little  impact  on  the 
correlations  of  these  scores  with  the  predictor  composites. 

Stepwise  multiple  regression  solutions  within  each  of  the  six  cctegories 
of  predictor  constructs  are  shown  in  Tables  4-25  and  4-26.  The  regression 
equations  Ir.  Table  4-2S  were  computed  on  the  combined  samples  from  the  nine 
MOS  in  Batch  A  for  each  of  the  lost  four  Army-wide  performance  factors  (i.e.. 
General  soldiering,  Effort/Leadership,  Personal  Discipline,  and  Physical 
Fitnsss/Military  Bearing).  The  coefficients  were  computed  on  the  combined 
samples  because  a  series  of  analyses  of  variance  had  shown  few  Predictor  by 
HOS  interactions  when  the  dependent  variable  was  one  of  the  four  Army-wide 
factors.  However,  the  profile  of  regression  coefficients  for  predicting  the 
Core  Technical  Proficiency  factor  was  significantly  different  across  MOS.  The 
HOS  by  MOS  stepwise  regression  solutions  within  predictor  category  are  shown 
in  Table  4-26. 

For  the  four  Army-wise  components,  some  comparisons  of  Interest  are  the 
following: 

•  Among  ASVAB  scores  the  quantitative  and  technical 
scores  contribute  the  most  to  the  prediction  of 
General  Soldiering  Proficiency.  Tne  verbal  score 
plays  a  more  prominent  role  in  the  prediction  of  the 
Core  Technical  performance  factor. 

•  While  ASVAB  does  not  contribute  much  to  the  prediction 
of  performance  factors  4  and  5,  the  ASVAB  technical 
score  does  make  a  relatively  large  contribution  to  the 
prediction  of  factor  3,  the  Effort/Leadership  factor. 

•  The  differential  contributions  of  the  temperament 
(ABLE)  scores  to  prediction  of  performance  factors  3, 

4,  and  5  are  clear,  significant,  and  pronounced.  The 
profiles  look  like  they  should. 

•  The  combat  interests  score  was  the  most  predictive 
Interest  score  among  the  scores  generated  from  the 
AVOICE. 
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T«b1«  4-2S 

Rtsd Us  of  Stopwlst  Regrosslons  Within  Each  Pradictor  Domain  for  tht 
Four  Aray-Wldo  Porformanco  Constructs  Across  All  Nina  Batch  A  NOS 


Criterion  Construct 

canarai 

Effort  and 

Effort  and 

Personal 

Phys  Fitness 

Pradictor 

Soldiering 

Leadership 

Laadorshlb 

Dliclpllna 

M'n  Oearing 

Construct 

(raw  score; 

(rasld  score) 

(raw  score) 

(raw  score) 

(raw  score) 

ASVAB  Factors 

VarSal 

.10 

.03 

-.07 

-.03 

-.11 

Quantltatlva 

.20 

.08 

.03 

.07 

.03 

Ttchnical 

.26 

.21 

.21 

.06 

-.05 

Sptad 

.03 

.07 

.09 

.04 

.10 

AOJ.  UNCORR  & 

.(61 

.280 

.206 

.106 

.161 

Spatial 

• 

Ovarall  Spatial 

.47 

.25 

.14 

.07 

-.05 

UNCORRECTEO  & 

.466 

.253 

.142 

.068 

.047 

Coaputar 

ConpUx  Parc  'paad 

•.09 

-.06 

-.07 

•• 

Coaplax  Parc  Accy 

.19 

.07 

.09 

.05 

•ea 

Nuabar  Spaad/Accy 

-.14 

-.06 

-.09 

-.03 

mm 

Psychoaotor 

-.19 

-.08 

-.10 

•  • 

mm 

Slap  Raactlon  Accy 

.04 

mm 

•  • 

-.06 

Slap  Raactlon  Spaad 

•• 

•  • 

mm 

•  • 

-.07 

AOJ.  UNCORR  & 

.J63 

.149 

.208 

.032 

.071 

Toaparaaant 

Adjustaant 

.09 

.04 

.03 

.03 

•a 

Oapandablllty 

Achievawnt 

.04 

.04 

.23 

.06 

.25 

.30 

.12 

.12 

Phys  Condition 

-.06 

mm 

-.06 

.24 

AOJ.  UNCORR  & 

.129 

.255 

.303 

.303 

.356 

Intarasts 

Coabat 

.24 

.20 

.17 

mm 

.04 

Machlnas 

•• 

•  • 

-.04 

-.06 

Audiovisual 

9m 

Tachnical 

mm 

.06 

.08 

.09 

.14 

Food  Sarvlea 

-.10 

-.16 

-.12 

.06 

-.05 

Protact iva  Svc 

-.06 

•  a 

-.09 

AOJ.  UNCORR  & 

.229 

.235 

.199 

.078 

.119 

Job  Values 

Security 

•• 

.03 

.05 

.05 

.10 

Autonoay 

.05 

.07 

.03 

-.06 

-.05 

Routine 

-.11 

-.12 

-.09 

-.03 

-.02 

AOJ.  UNCORR  & 

.123 

.150 

.112 

.063 

.097 
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Tabic  4-26 


RcfuUs  of  Stopwlst  Rogrotslons  Within  Each  Rrodletor  Domain  for  NCS-SoasIflc 
Cora  Taehnical  Proflcloncy  for  Each  of  tha  Nina  Batch  A  NOS 


MOS 

Pradlctor  Construct 

UB 

13B 

19E 

31C 

63e 

54C 

71L 

91A 

95B 

ASVA.B  Factors 

Verbal 

.20 

•  • 

.13 

.19 

mm 

vl6 

.29 

.11 

Quantitative 

.14 

.09 

.15 

.14 

mm 

.14 

.38 

.12 

.15 

Technical 

.23 

.23 

.27 

.23 

.55 

.34 

.11 

.19 

.11 

Speed 

.10 

mm 

.11 

-• 

mm 

.08 

.17 

.09 

AOJ.  UNCORR  1 

.S03 

.254 

.452 

.427 

.538 

.413 

.441 

.455 

.282 

Spatial 

Overall  Spatial 

.48 

.33 

.43 

.32 

.41 

.37 

.41 

.38 

.28 

UNCORRECTED  & 

.475 

.334 

.432 

.315 

.412 

.366 

.111 

.380 

.275 

Coaputer 

Coaplex  Pare  Speed 

-.25 

-.10 

•  • 

•e» 

-.08 

-.14 

mm 

mm 

•m 

Coaplex  Parc  Accy 
Nuaber  Spead/Aecy 

.29 

.11 

.15 

.13 

mm 

.19 

.27 

.09 

.13 

-.11 

-.11 

-.20 

-.25 

-.08 

-.07 

-.22 

-.20 

-.19 

Psychoaotor 

-.13 

-.17 

-.15 

-.09 

'.20 

-.10 

mm 

-.15 

-.09 

Slap  Reaction  Aeey 

•  ei 

•  • 

.12 

•• 

.08 

.07 

mm 

.08 

am 

Slap  Reaction  Speed 

•  • 

•• 

•• 

•• 

mm 

mm 

mm 

mm 

mm 

AOJ,  UNCORR  a 

.406 

.257 

.343 

.253 

.242 

.269 

.325 

.251 

.228 

Teaperaaent 

Adjustmint 

.12 

.14 

.10 

mm 

mm 

.10 

.08 

Oeaendablllty 

Acnieveaent 

•• 

.19 

mm 

mm 

.08 

.10 

mm 

.09 

mm 

mm 

.10 

.14 

.19 

mm 

.12 

mm 

Phys  Condition 

— 

mm 

-.13 

mm 

-.12 

— 

-.10 

-.15 

mm 

AOJ  UNCORR  a 

.143 

.000 

.129 

.000 

.119 

.000 

.175 

.211 

.114 

Interests 

Coabat 

.25 

.25 

.25 

•  • 

.11 

.09 

.12 

.18 

mm 

Machines 

e»m 

.10 

•o» 

.13 

.38 

.09 

-.23 

mm 

mm 

Audiovisual 

•  • 

•• 

•• 

-.11 

mm 

mm 

-.08 

Technical 

.08 

‘ 

.10 

mm 

.19 

mm 

mat 

Food  Service 

-.22 

-.16 

-.11 

-.10 

-.12 

-.07 

mm 

-.06 

Protective  Sve 

-.11 

-.10 

mm 

— 

-.14 

■  m 

mm 

mm 

AOJ,  UNCORR  a 

.276 

.255 

.218 

.000 

.441 

.135 

.160 

.039 

.000 

Job  Values 

Security 

mm 

mm 

mm 

mm 

.14 

mm 

Autoflony 

.08 

.17 

mm 

mm 

.14 

.11 

•  • 

mm 

mm 

Routine 

-.15 

•el4 

-.21 

mm 

-.10 

-.07 

-.12 

mm 

-.08 

AOJ,  UNCORR  a 

.141 

.201 

.165 

.000 

.133 

.080 

.038 

.058 

.000 

For  the  MOS  by  MOS  steowlse  regression  coefficient  profiles  used  to 
predict  the  Core  Technical  factor  (I.e.,  Table  4-26),  the  greatest 
differential  is  within  the  ASVAB  and  the  AVOICE,  and  to  a  lesser  extent  within 
the  spatial  and  computerized  tests. 

To  look  at  the  coefficients  In  another  way,  stepwise  regressions  were 
carried  out  when  all  24  predictor  scores  were  used  to  predict  each  perforsance 
factor.  Again,  the  analyses  for  the  four  Army-wide  criterion  factors  were 
carried  out  on  a  combined  sample  while  the  analyses  against  the  Core  Technical 
factor  were  done  MOS  by  MOS.  The  results  are  shown  in  Tables  4-27  and  4-2&. 

Again  the  differential  patterns  appear  across  the  four  Army-wide 
performance  factors  and  across  MOS  for  tne  Core  Technical  factor.  However,  a 
surprise  was  the  strong  role  played  by  the  spatial  and  the  combat  Interest 
constructs  In  predicting  the  tecnnical  performance  factor  In  the  combat 
specialties. 

To  round  out  the  picture,  the  zero-order  correlations  (validity 
coefficients)  corresponding  to  the  regression  coefficients  In  Tables  4-27  and 
4-28  are  shown  In  Tables  4-29  and  4-30. 

Summary 

At  this  point,  Project  A  had  reached  a  number  of  its  basic  goals. 

•  Multiple  criterion  measures  had  been  developed  and  used  to 
formulate  five  components  of  Job  performance. 

•  ASVAB  was  shown  to  be  a  highly  valid  predictor  of  Jot 
performance  as  reflected  In  the  Core  Technical  performance 
and  General  Soldiering  performance  components. 

•  There  was  a  considerable  differential  prediction  for  the 
totrl  test  battery  across  the  five  performance  components 
within  each  MOS. 

•  The  'on-cogn1t1ve  predictors  added  significantly  to  the 
prediction  of  the  “will-do"  components  of  performance  and 
should  prove  to  be  valuable  additions  to  the  total  system. 

•  As  was  e.'pected,  differential  prediction  across  MOS  was 
limited  largely  to  the  Core  Technical  performance  factor. 

Both  the  ASVAB  and  the  new  experimental  cognitive  tests 
should  contribute  to  differential  prediction  equations 
across  major  MOS  clusters.  However,  the  full  analyses 
necessary  to  determine  the  prediction  equations  remain 
to  be  done. 
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Table  4-27 

StRuItt  of  Stepwlit  Regresslont  fov  the  Four  Aray-Wlde  Perforunce  Constructs 
Across  All  Nine  Notch  A  NOS 


Crltorien  Construct 


Sennral 

Effort  and 

Effcrt  and 

Persona] 

Ph)s  Fitness 

frodictor 

Soldlerinq 

Leadership 

Lsadershlp 

Discipline 

Nil  Scarlno 

Construct 

(raw  score) 

(res id  reore) 

(raw  score) 

(raw  score) 

(raw  score) 

ASVAI  Psetors 

Verbil 

.69 

-  .63 

-.06 

•• 

-.10 

Qunntltitivt 

.09 

.04 

mm 

.05 

•• 

Ttchnicsl 

.12 

.11 

15 

.07 

-.03 

SpMd 

•  • 

.04 

.06 

.03 

.66 

Spit let 

ftvtrtll  Spsticl 

.25 

.13 

*• 

•• 

•• 

CooDutir 

Ceaplts  Pore  SpMd 

mm 

-.05 

•• 

•• 

Coaptsx  P«rc  Acey 
NuBCtr  ;p«sd/Aecy 

.08 

mm 

.C4 

•  • 

-.02 

mm 

•  to 

.03 

•• 

Psychoootor 

-.04 

mm 

-.02 

mm 

•• 

Sian  Ruction  Aecy 

•  « 

mm 

•• 

mm 

-.04 

Slap  Rccetlou  Spud 

-.03 

mm 

•• 

mm 

-.05 

TtnporiMnt 

Ad  Ju  stunt 

•• 

mm 

•• 

mm 

•  • 

OOMndUlllty 

AcnItvMnnt 

.11 

-.04 

.05 

.15 

.11 

.20 

.30 

.03 

.09 

.14 

Phyi  Condition 

.03 

•u 

-.05 

.22 

tnUroitk 

CoaPtt 

.13 

.19 

*• 

.04 

HccMnu 

•  • 

mm 

•• 

•  • 

-.05 

Audiovisual 

•  • 

-.02 

-.04 

-.03 

.04 

Technical 

•  • 

•• 

•  • 

Food  Sorvico 

-.04 

-.08 

-.05 

-.04 

Protect 1v«  Sve 

.03 

-.03 

-.05 

Job  Vifvu 

Security 

»  ■» 

•  • 

•ao 

Autonoay 

•• 

-.05 

-.04 

Routine 

-.03 

-.04 

-.03 

•• 

•• 

Ai)J,  UUCORA  & 

.540 

.392 

.366 

.317 

.385 
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Ttblft  4-28 

RikuUt  of  SUpMlit  R«grt8i1ont  for  MOS-SptelfIc  Coro  Tochnical  Proflcioncy 
for  Each  of  tha  Nina  Batch  A  NOS 


MOS 

Prtdictor  Cenitruct 

111 

131 

16E 

31C 

638 

64C 

71L 

91A 

9SB 

ASVAE  F actor 1 

Varbal 

.17 

.10 

.21 

aa» 

.08 

.26 

.13 

Quantitativt 

.09 

•  • 

.30 

•• 

•• 

.27 

mm 

•  • 

Ttchnical 

.10 

•• 

.16 

.35 

.30 

-.13 

.12 

mm 

Soaod 

•  • 

•• 

•• 

-.07 

•  • 

.13 

mm 

Spatial 

Ovarall  Spatial 

.20 

.25 

.19 

•• 

.14 

.16 

.25 

.23 

.22 

CoBputar 

Coapltx  Parc  Spaad 

.18 

•  • 

•• 

•  • 

-.12 

•• 

•  • 

Coumltx  Pare  Acey 
Nuabar  Spaad/Accy 

.13 

•  • 

.09 

-.10 

•  • 

.14 

.15 

*• 

.09 

— 

•  • 

-.09 

mm 

•• 

•• 

•  r* 

-.11 

Piychonotor 

mm 

•  • 

•• 

mm 

•  • 

•• 

•• 

Slap  Raaetlon  Acey 

mm 

•  • 

.07 

mm 

mm 

•• 

•  • 

•• 

Slap  llaaetlon  Spaad 

mm 

-.10 

•• 

mm 

-.11 

•  s 

•• 

•  • 

•• 

Taaparaaant 

AdJuttMnt 

-.08 

•• 

•• 

-.09 

•• 

•  • 

•aa 

•  • 

Dapandablllty 

.12 

•• 

.10 

.15 

.13 

.07 

.11 

.22 

.12 

Aehlavaaant 

•  ai 

•  • 

•  • 

mm 

mm 

•  • 

•  • 

•  • 

Phys  Condition 

•« 

-.09 

mm 

-.06 

•• 

-.13 

Intaratti 

Coabat 

.15 

.21 

.17 

mm 

mm 

-.16 

.16 

*• 

Machinal 

•• 

•• 

.21 

.32 

•  • 

Audlovliual 

•  • 

•  • 

-.14 

•  • 

-.09 

-.13 

Taehnical 

mm 

•• 

.12 

*• 

Food  Sarvica 

-.07 

mm 

mm 

mm 

•• 

•  • 

Protactiva  Svc 

•  • 

-.08 

mm 

-.08 

•• 

•• 

i)flb  Prafarancai 

Saeurlty 

•• 

•• 

mm 

•• 

.09 

•• 

.12 

.09 

Autonoay 

aa« 

.09 

•• 

-.11 

•• 

•• 

•  • 

Routina 

-.06 

-.11 

mm 

•  m 

•• 

•• 

.07 

•  • 

AOJ,  UNCORR  & 

.860 

.308 

.464 

.352 

.591 

.401 

.481 

.507 

.294 
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Tabit  4-29 


Corralatlons  Bttwttn  tht  Predictor  Constructs  and  tho  Ansy-Widt  Crltorlon 
Constructs  Coablnod  Across  Batch  A  NOS* 


Critarion  Construct 


Genera t 

Effort  and 

Effort  and 

bersonsl 

Phys  FUneti 

Prtdictor 

Soldiering 

Leadership 

Leadership 

Discipline 

Mil  Bearlno 

Construct 

(raw  score) 

(resid  score) 

(raw  score) 

(raw  score) 

(raw  scors) 

ASVAB  Factors 

Technical 

.55 

.  .39 

.28 

.12 

-.08 

Verbal 

.52 

.35 

.20 

.10 

-.07 

Quantitative 

.54 

.36 

.23 

.14 

-.01 

S)>eed 

.37 

.29 

.21 

.11 

.07 

Cognitive  Constructs 

Overall  Spatial 

.59 

.U 

.24 

.11 

-.03 

Coaputar  Constructs 

Coaplex  Parc  Speed 

-.21 

-.17 

-.03 

-.04 

Coop lex  Pere  Accy 
Nuaber  Speed/Acey 

.30 

.18 

.12 

.08 

-.01 

-.44 

-.31 

-.21 

-.09 

-.01 

PsychoMtor 

-.40 

-.27 

-.20 

-.04 

-.01 

Siso  Reaction  Accy 

.18 

.09 

.05 

.05 

-.05 

Slap  Reaction  Speed 

-.19 

-.13 

-.08 

-.01 

-.06 

AILE  Constructs 

Adjustaont 

.18 

.22 

.23 

.13 

.17 

Physical  Condition 

-.03 

.09 

.10 

-.02 

.30 

Oependablllty 

Acnleveoent 

.09 

.16 

.15 

.30 

.21 

.33 

.30 

.20 

.22 

.27 

AVOICE  Constructs 

Audiovisual  Arts 

.02 

.02 

.01 

.00 

.07 

Coabat  Related 

.23 

.22 

.19 

.00 

.03 

Food  Service 

-.12 

-.14 

-.11 

-.06 

.00 

Structural /Mach 1 nes 

.06 

.06 

.06 

-.05 

-.01 

Protective  Services 

-.04 

.03 

.04 

-.04 

.02 

Skilled  Technical 

.04 

.07 

.06 

.05 

.11 

Job  Constructs 

Autonoay 

.13 

.15 

.09 

-.02 

-.02 

Routine 

-.21 

-.20 

-.15 

-.06 

-.04 

Job  Security 

.09 

.11 

.10 

.05 

.09 

*torrtctad  Tor  rinflo  faiirictlon. 


151 


Ttbit  4-30 

CorrtUtlonc  Bttwten  th*  Pradlctor  Constructs  and  Coro  Tochnical  Proflcltnc/ 


NOS 


Prtdicter  Cenitruct  111  13B  19E  31C  636  64C  71L  91A  95B 


ASVAB  Factors 
Ttchnical 
Vtrbal 

Quantltativt 

SpMd 

Cognttiva  Construct 
0 vara 11  Spatial 

Conputar  Constructs 
Coop lex  Parc  Speed 
Coaplex  Parc  Acey 
Nuabar  Spaad/Accy 
Psyehoooter 
Slop  Reaction  Accy 
Slop  Reaction  Spaed 

ABLE  Censtnicts 
Adjustoant 
Physical  Condition 
OapandaOlllty 
Achlevaoent 

AVOICE  Constructs 
Audiovisual  Arts 
Coobat  Ralatad 
Food  Service 
Structural/Machinas 
Protective  Svc 
Skilled  Technical 

Job  Prefarancas 
Autenoaiy 
Routine 
Job  Security 


.60 

.36 

.86 

.63 

.33 

.49 

.60 

.32 

.49 

.48 

.28 

.28 

.63 

.41 

.85 

.33 

•  .IS 

-.17 

.38 

.24 

.32 

.48 

•  .30 

•  42 

.43 

•.30 

-.36 

.17 

.11 

.26 

.17 

•  .19 

•  .15 

.26 

.13 

.18 

.06 

•  .04 

•  .09 

.16 

.01 

.09 

.31 

.06 

.16 

.04 

•.08 

•.01 

.23 

.21 

.31 

.30 

•  .14 

•  .14 

.12 

.09 

.06 

.08 

•  .OB 

-.04 

.07 

-.03 

.09 

.21 

.22 

.09 

.27 

•.18 

•.27 

.14 

.13 

.08 

.89 

.69 

.55 

.67 

.80 

.44 

.67 

.48 

.46 

.87 

.29 

.27 

.58 

.56 

.51 

.28 

•.24 

-.25 

.22 

.16 

.28 

.62 

-.37 

•  .38 

.34 

•.36 

-.34 

.17 

.14 

.19 

.10 

•.23 

•  .19 

.06 

.21 

.07 

.18 

-.13 

•  .07 

.04 

.00 

.01 

.14 

.20 

.09 

.20 

•.14 

.00 

.08 

.31 

.24 

.01 

•.20 

•  .14 

.05 

.41 

.16 

.01 

•.10 

•.05 

.12 

•  .08 

.00 

.22 

.25 

.21 

.19 

-.21 

•.20 

.02 

.06 

.14 

.37 

.61 

.51 

.56 

.71 

.59 

.63 

.64 

.59 

.82 

.86 

.47 

.57 

.64 

.56 

.11 

•  .28 

-.20 

.40 

.25 

.26 

.50 

•  .57 

•  .S3 

.26 

•  .44 

•  .32 

.27 

.16 

.20 

.11 

-.21 

-.23 

.20 

.12 

.27 

.12 

-.09 

•  .13 

.21 

.18 

.24 

.27 

.22 

.25 

.19 

.13 

•.14 

.02 

.22 

.03 

.03 

-.09 

-.19 

.19 

.01 

-.19 

.01 

-.13 

•  .16 

.17 

.00 

-.03 

.21 

.23 

.09 

.19 

.22 

•.30 

.20 

.18 

•.01 

aCorrectad  for  range  restriction. 
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The  results  sununarized  In  this  section  were  Impressive  and  they  have 
formed  the  basis  for  modifications  to  the  ASVAB  Aptitude  Area  composites. 
However,  to  realize  the  full  benefit  of  these  results,  the  following  things 
must  happen.  Both  the  covariance  structures  and  the  estimates  of  predictive 
validity  oust  be  cross- vail dated  with  a  aenuine  predictive  design  (i.e. ,  the 
Longitudinal  Validation},  rules  for  forming  criteria  composites  must  be 
developed,  the  utility  of  accurate  predictions  must  be  estimated,  the 
specifics  of  the  full  selectlon/classificatlon/promotlon  decision  system  must 
be  modeled,  and  the  effects  of  uslno  the  new  predictors  In  various 
combinations  under  a  variety  of  goals  and  constraints  must  be  evaluated. 


A  method  for  obtaining  criterion  composites  and  subcomposites  has  been 
developed,  the  utility  of  a  comp  ete  set  of  MOS  by  performance  level 
combinations  has  been  estimated  presented  In  Chapter  5),  and  the  data  from 
the  Longitudinal  Validation  sample  have  been  collected.  Further  work  remains 
on  the  measurement  of  second-tour  performance  and  on  the  full  operational 
model  of  the  complete  decision  system. 


WEIGHTING  CRITERION  COMPOSITES 


The  Concurrent  Validation  results  Indicated  that  each  of  the  five 
criterion  comuonents  can  be  predicted  with  considerable  validity  and  that  the 
validity  of  the  different  predictor  domains  varies  systematically  across 
criterion  components.  A  subsequent  focus  was  on  the  best  method  for  obtaining 
importance  weights  when  the  five  components  are  combined  Into  an  overall 
composite  index  of  performance  (Sadacca,  Campbell,  White,  B  DiFazio,  1988). 
Consequently,  weighting  Judgments  were  gathered  from  NCOs  and  officers 
familiar  with  each  MOS. 

Tbl  EUPt  EMBCllMaU 

Three  pilot  experiments  were  conducted  to  select  the  construct  weighting 
procedure.  The  qoal  in  conducting  the  experiments  was  to  select  one  or  more 
construct  weighting  procedures  that  would  be  acceptable  to  the  Army  and  would 
yield  a  reliable,  valid  set  of  weights  for  each  of  the  sampled  MOS  when  the 
procedures  were  applied  by  the  appropriate  subject  matter  experts.  The 
experiments  and  their  results  will  be  described  briefly  prior  to  describing 
the  actual  factor  weighting  procedure. 


The  general  procedure  was  that. of  a  small  group  workshop  of  10-16 
officers  who  tried  different  methods'  and  evaluated  the  ease  of  use, 
acceptability,  and  perceived  validity  of  each  method.  The  reliabilities  and 
distributional  properties  of  the  assigned  weights  were  also  analyzed. 


Experiment  One 


In  the  first  experiment,  three  procedures  were  used  and  all  involved 
direct  Judgments  of  the  relative  weight  for  each  performance  construct  in 
forming  an  overall  composite  score.  In  procedure  A,  the  officers  were  first 
asked  to  rank  order  the  constructs  and  to  assign  100  points  to  the  first 
ranked.  The  other  constructs  were  scaled  so  as  to  produce  a  ratio  estimate. 
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In  procedura  B,  the  officers  divided  100  points  among  the  constructs  1r  a 
manner  that  reflected  the  relative  weight..  In  procedure  C,  15  pairs  of 
factors  were  presented  In  a  paired  comparison  protocol.  For  the  paired 
comparisons,  the  order  of  presentation  followed  the  optimization  procedure 
worked  out  by  Ross  (1934),  and  the  officers'  task  was  to  divide  100  points 
between  the  two  constructs  being  Judged  In  any  given  pair. 

The  Judgments  were  made  In  the  context  of  three  different  scenarios 
which  described  a  peacetime  condition,  a  period  of  heightened  tensions,  and  a 
wartime  setting  In  which  hostilities  had  Just  broken  out.  Each  officer  used 
four  7*po1nt  scales  to  evaluate  the  weighting  methods  on  the  following 
dimensions: 


(1 

'2 

'13 

4 


Acceptability  to  the  Army. 

Ease  of  making  the  Judgments  called  for  by  the  method. 

Their  confidence  In  the  validity  of  the  Judgments  made. 

The  amount  of  agreement  with  otner  workshop  participants  that 
could  be  expected. 


After  the  ratings  were  completed,  an  Informal  discussion  period  was  held 
to  solicit  opinions  about  the  methods.  The  officers  generally  expressed 
preference  for  procedures  A  and  C  over  procedure  B  and  thought  that  the  time 
they  spent  worrying  about  whether  the  sum  of  their  weights  equaled  100 
detracted  from  their  ability  to  Judge  the  relative  Importance  of  the  weights. 
It  also  seemed  that  a  heightened  tension  scenario  would  evoke  a  more  uniform 
frame  of  reference  across  the  many  different  kinds  of  SMEs  providing  the  MOS 
construct  weights. 

Experiment  Two 


The  second  pilot  experliMnt  used  two  additional  methods,  both  variants 
of  a  conjoint  procedure,  In  two  4-hour  workshops.  One  was  attended  by  15 
officers,  the  other  by  15  NCOs.  The  three  weighting  methods  are  described  In 
the  following  Instructions  to  the  participants: 

(1)  Rank  order  the  five  constructs,  assign  100  points  to  the 
first  ranked  construct,  and  then  scale  the  other  constructs 
accordingly  (same  as  procedure  A  in  Experiment  1). 

(2)  Based  upon  their  scores  on  the  separate  constructs,  rank 
order  25  Infantrymen  In  order  of  their  overall  performance. 

(For  each  of  the  Infantrymen,  a  different  set  of  performance 
scores  on  the  five  constructs  was  given  on  7-po1nt  scales 
that  range  from  the  lowest  level  of  performance  to  the 
highest. 

(3)  Based  upon  their  scores  on  two  constructs,  rank  order  10 
sets  of  13  Infantrymen  In  order  of  their  overall  perform¬ 
ance.  (In  each  set,  the  performance  scores  on  two 
constructs  are  given  on  the  same  7-po1nt  scales  used  In  the 
second  method  above.  A  set  of  13  Infantrymen  Is  given  for 
each  of  the  10  possible  pairs  of  the  five  constructs. 
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The  second  and  third  methods  are  variants  of  the  conjoint  approach  to 
scaling.  The  Judges'  weights  for  the  performance  constructs  are  Inferred  from 
the  rank  order  given  sets  of  hypothetical  soldiers  whose  performance  on  the 
constructs  has  been  systematically  varied.  Both  officers  and  NCOs  generally 
preferred  the  direct  estimation  owthod  most  and  the  conjoint  full  profile 
method  least. 

In  general,  the  conjoint  paired  comparison  method  yielded  the  highest 
Intraclass  reliability  estimates  for  both  the  officers  and  NCOs  while  the 
conjoint  full  profile  method  had  the  lowest  values.  The  correlation  between 
the  moan  officer  and  NCO  weights  obuined  from  the  conjoint  paired  comparisons 
method  also  was  the  highest  (r  >  .60).  The  mean  weights  obtained  from  the 
direct  estimation  and  the  conjoint  paired  comparison  methods  were  highly 
correlated  (i  ■  .93)  while  the  correlations  of  these  weights  with  those 
obtained  from  the  conjoint  full  profile  method  were  quite  low. 

On  the  basis  of  these  results,  It  was  decided  to  drop  the  conjoint  full 
profile  method  from  further  consideration. 

Experiment  Three 

The  third  pilot  study  also  Involved  two  4-hour  workshops,  composed  of 
seven  officers  and  eight  NCOs.  Each  participant  used  the  three  different 
weighting  methods  described  below. 

Based  on  scores  on  two  constructs,  participants  were  asked  to  rank  order 
21  sets  of  13  Infantrymen  In  order  of  their  overall  performance.  This  Is  the 
same  conjoint  paired  comparison  procedure  used  In  the  second  experiment,  but 
In  addition,  the  Judges  assigned  overall  performance  scores  that  reflected  the 
solders'  relative  overall  performance. 

The  pa<^t1c1pants  were  then  asked  to  rank  order  the  constructs,  assign 
100  points  to  the  first  ranked  construct,  and  then  scale  the  other  constructs 
accordingly  (the  direct  estimation  procedure  used  In  Experiments  1  and  2). 

The  third  method  was  a  variant  of  the  second  and  Incorporated  a  Delphi 
procedure.  Participants  first  Indicated  why  they  had  ranked  and  weighted  the 
perrormance  factors  as  they  had  In  method  2  above.  The  reasons  were  passed 
around  to  the  other  workshop  participants.  After  considering  this  feedback 
Information,  the  participants  reassigned  weights  to  the  performance  factors, 
using  method  2  above.  The  Delphi  procedure  was  then  repeated  once  more. 

Several  Inferences  were  made  from  the  data.  First,  there  was  no 
evidence  that  the  one-rater  reliabilities  were  Improved  substantially  by 
adding  the  requirement  to  provide  overall  performance  scores  In  addition  to 
ranks  In  the  conjoint  paired  comparison  method.  Nor  were  agreement  Indexes 
Improved  by  adding  the  requirement  to  obtain  Delphi  feedback. 

The  choice  between  the  direct  estimation  method  and  the  conjoint  paired 
comparison-ranking  method  was  not  clear-cut.  The  direct  estimation  method 
generally  received  higher  evaluation  ratings  In  both  Experiments  2  and  3  and 
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would  obviously  take  less  time  to  administer  than  the  conjoint  method.  On  the 
other  hand,  the  officer  and  NCO  one-rater  reliabilities  obtained  for  the 
conjoint  method  were  somewhat  higher  in  both  experiments.  However,  both  the 
direct  estimation  and  paired  comparison  methods  had  correlations  betv«en  the 
officer  and  NCO  mean  weights  above  .80  in  both  experiments.  The  correlations 
between  the  mean  weights  obtained  in  Experiment  2  with  those  obtained  in 
Experiment  3  were  very  high  for  both  methods  (.96  for  the  direct  estimation 
and  .97  for  the  conjoint  method). 

In  short,  both  appeared  to  be  sound  methods  and  it  was  decided  to  use 
both  to  obtain  the  construct  performance  weights  for  the  Project  A  MOS  sample. 


The  component  weights  were  collected  in  a  series  of  2*hour  workshops. 
Separate  workshops  were  held  for  NCOs  and  officers  at  each  of  two  posts  for 
each  MOS.  Of  a  total  of  36  Judges  for  each  M0S^  half  were  to  come  from  field 
units  (FORSCOH  and  USAREUR)  and  half  from  proponent  posts  (TRADOC).  The 
Judges  were  to  be  evenly  divided  among  NCOs,  company  grade  officers,  and  field 
grade  officers.  Table  4-31  shows  the  total  sample  of  702  Judges  subdivided  by 
MOS,  type  of  post,  and  grade  level.  Although  some  individual  MOS  proportions 
did  not  meet  the  taroet,  overall  the  proportions  of  officers  to  NCOs  and 
Judges  from  field  units  to  proponent  NOS  posts  were  close  to  the  desired 
composition. 

At  each  workshop,  after  a  briefing  on  Project  A,  the  participants  were 
first  given  general  instructions  which  covered  the  background  and  purpose  of 
the  workshop,  and  descriptions  of  the  performonce  components  (constructs)  and 
the  two  methods  (direct  estimation  and  conjoint  paired  comparison-ranking) 
that  would  be  used  to  obtain  weights  for  the  components.  The  components  to  be 
weichted  were  the  five  Job  performance  criterion  factors  developed  as  part  of 
Project  A's  performance  modeling  effort.  The  two  scaling  methods  were  then 
administered,  always  in  the  same  order. 


To  better  reflect  the  combined  Judgments  of  the  construct  weights  across 
the  Judges  for  each  MOS,  the  data  from  each  Judge  were  standardized  prior  to 
averaging.  For  the  direct  estimation  method,  the  average  of  the  five 
construct  weights  of  all  Judges  was  set  at  20.0,  and  the  average  of  the  five 
weights  for  any  group  of  Judges  within  and  across  MOS  was  also  set  at  20.0. 

The  moan  weight  of  a  given  construct  obtained  by  averaging  the  Judges* 
individual  weights  could,  of  course,  be  different  from  20*. 


*M0S  96B  (Intelligence  Analyst),  which  was  added  to  the  LV  MOS  sample  to 

a  ....  _ I _ I  j  ^  ^1 _  _ _  _ i.j _  « 


20  MOS  studied  in  this  effort. 
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Tab1»  4-31 

Composition  of  Judging  Sample*  for  Weighting  Project  A  MOS 


m 

Total 

MOS  Officer 

Officer 

NCO 

IIB 

Infantryman 

17 

6 

19 

6 

36 

12 

12B 

Combat  Engineer 

17 

4 

12 

6 

29 

10 

138 

Cannon  Crewman 

6 

6 

21 

6 

27 

12 

16S 

MANPADS  Crewman 

11 

6 

11 

5 

22 

11 

19E 

Armor  Crewman 

11 

5 

14 

6 

25 

11 

27E 

TOW/Dragon  Repairer 

mm 

6 

16 

5 

16 

11 

31C 

Single  Channel  Radio  Oper 

13 

6 

12 

6 

25 

12 

51B 

Carpentry/Masonry  Specialist 

4 

6 

27 

6 

31 

12 

54E 

Chemical  Operations  Spec 

20 

14 

r*  m 

- 

20 

14 

S5B 

Ammunition  Specialist 

4 

3 

24 

9 

28 

12 

63B 

Light  Wheel  Vehicle  Mechanic 

7 

2 

20 

11 

27 

13 

64C 

Motor  Transport  Operator 

10 

5 

12 

6 

22 

11 

67N 

Utility  Helicopter  Repairer 
Administrative  Specialist 

12 

1 

17 

12 

29 

13 

71L 

13 

6 

9 

7 

22 

13 

76W 

Petroleum  Supply  Specialist 

10 

11 

m  m 

10 

11 

76Y 

Unit  Supply  Specialist 

IS 

S 

a 

5 

23 

10 

91A 

Medical  Specialist 

25 

13 

m  m 

25 

13 

94B 

Food  Service  Specialist 

12 

7 

8 

4 

20 

11 

95B 

Military  Police 

23 

13 

•  • 

•  • 

23 

12 

96B 

Intelligence  Analyst 

730 

175 

11 

m 

6 

T55 

11 

m 

6 

73T 

*In  addition  to  the  702  officers  and  NCOs  listed  in  this  Table,  there  were 
10  Judges  whose  grades  were  unknown,  making  the  total  sample  712. 


For  the  conjoint  method,  the  data  from  each  Judge  was  scaled  using  a 
method  developed  by  Comrey  (1950)  which  Is  described  in  Torgerson  (1958). 
Essentially,  the  multiple  regression  equation  predicting  the  Judge's  rank 
orders  of  the  two  performance  construct  scores  of  the  15  hypothetical  soldiers 
was  first  obtained  for  each  of  the  10  sets  of  soldiers.  The  ratio  of  the  two 
regression  weights  for  each  pair  of  constructs  then  became  the  basic  data 
entering  Into  the  scaling  procedure.  Since  the  correlation  between  the  two 
construct  scores  of  the  15  hypothetical  soldiers  on  each  performance  rating 
sheet  was  specified  to  be  zero,  the  ratio  of  the  regression  weights  is 
directly  proportional  to  the  correlation  of  each  set  of  construct  scores  with 
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the  judge's  rank  order  of  the  soldiers.  The  means  and  standard  deviations  of 
the  construct  scores  were  equal  for  all  constructs. 

The  scaling  procedure  employs  a  least  squares  solution  to  obtain  a  set  of 
weights  that  best  fit  the  observed  ratios.  The  resultant  weights  are  so 
scaled  that  their  geometric  mean  Is  1.0.  To  facilitate  the  comparison  of  the 
conjoint  weights  to  those  obtained  by  the  direct  estimation  method,  the 
conjoint  weights  for  each  Judge  were  also  1 Inearl v  transformed  so  that  their 
sum  was  equal  to  100  and  their  average  equal  to  20.0. 

Inter.ludoe  Reliability  and  Intermethod  Agreement 

The  NCO  l-rater  and  ji-rater  reliabilities  for  the  direct  estimation  and 
conjoint  scaling  methods  were  .132/. 425  and  .153/. 509,  respectively.  The 
corresponding  values  for  officers  were  .278/. 864  and  .287/. 867. 

The  correlations  across  the  20  MOS  of  the  average  weights  derived  from 
the  direct  estimation  and  conjoint  scaling  methods  using  officer  judgments 
ranged  from  .836  to  .996;  the  average  Intermethod  agreement  was  .951.  The 
corresponding  range  for  the  NCOS  was  .017  to  .922  and  their  average  MOS 
Intermethod  agreement  was  .653.  These  Intermethod  results  reflect  In  part  the 
lower  l>rater  reliabilities  obtained  for  the  NCOs  under  both  methods;  also, 
there  were  fewer  NCO  judges. 

ComoarJspn  of  the  Direct  Estimation  and  Conjoint  Scaling  Methods 

To  decide  whether  the  final  sets  of  weights  should  be  obtained  from  the 
direct  estimation  or  the  conjoint  method,  the  two  sets  of  weights  were 
compared  on  several  Indexes.  Though  the  differences  were  In  general  slight, 
they  all  favored  the  conjoint  method.  The  1-rater  and  n-rater  Intraclass 
reliabilities  for  the  combined  group  of  officers  and  NCOS  tended  to  be 
slightly  higher  for  the  conloint  method  across  the  20  MOS.  While  the 
differences  between  the  reliabilities  for  the  two  scaling  methods  were 
slightly  greater  for  the  NCOs  than  for  the  officers,  the  difference  favored 
the  conjoint  method  In  each  case. 

Also,  the  weights  assigned  the  constructs  by  the  NCOs  correlated  higher 
with  those  assigned  by  the  officers  when  the  conjoint  scaling  method  was  used. 

TJie  Final  Weight  Estimates 

Considering  the  above  findings,  the  decision  was  made  to  favor  the 
weights  derived  from  the  conjoint  scaling  method  In  combining  the  Individual 
construct  scores  Into  an  overall  composite  measure  of  performance.  They  are 
shown  In  Table  4-32. 

It  should  be  borne  In  mind  that  the  weights  are  based  on  comparative 
judgments  of  the  constructs  within  each  MOS  and  should  not  be  used  for 
comparisons  of  Importance  across  MOS.  Nevertheless,  It  Is  Interesting  to  note 
whether  the  relative  pattern  of  weights  differ  across  MOS  and  whether  some 
constructs  are  fairly  consistently  given  relatively  higher  weights  than 
others. 
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Table  4-3? 

Mean  Construct  Heights  by  Grade  and  HOS:  Conjoint  Method 
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For  all  20  MOS,  Physical  Fitness/Military  Bearing  received  the  lowest 
relative  weight.  In  13  of  the  20  MOS.  Core  Technical  Skills  received  the 
highest  relative  weight,  while  the  Effort/Leadership  construct  was  second 
overall.  The  Effort/Leadership  component  received  the  highest  relative  weight 
In  6  of  the  20  HOS.  For  the  most  part,  the  Core  Technical  construct  received 
the  highest  weight  for  the  technical  MOS  In  the  sample  and  the  Effort/Leader¬ 
ship  construct  received  the  highest  weight  for  the  combat  MOS.  The  General 
Soldiering  construct  received  the  highest  weight  for  only  one  MOS.  Military 
Police  (9SB).  These  MOS  differences  In  the  constructs  receiving  the  highest 
weights  undoubtedly  contributed  to  the  significant  Construct  by  MOS 
Interaction. 

Significant  mean  differences  between  the  weights  assigned  by  the 
officers  and  NCOs  were  found  for  two  constructs:  Officers  gave  significantly 
higher  relative  weights  to  the  Effort/Leadership  construct  than  did  NCOs. 
while  NCOS  gave  higher  weights  to  the  Physical  Mtness/MIlltary  Bearing 
construct  than  did  officers.  The  NCOs  may  have  been  giving  relatively  more 
weight  to  aspects  of  first- tour  soldiers'  performance  that  were  of  more 
Immediate  concern  to  them.  Although  the  mean  differences  were  only 
significantly  different  at  the  .10  level,  the  NCOs  gave  the  Personal 
Discipline  construct  weights  that  were  higher  on  the  average  than  those 
assigned  by  the  officers. 


SMTOftTY 

The  five  Project  A  performance  constructs  received  significantly 
different  patterns  of  weights  In  different  MOS  and  the  different  groups  of 
experts  agreed.  In  general,  on  the  relative  ranking  of  the  weights.  For 
example,  the  Effort/Leadership  construct  tends  to  be  rated  highest  among  the 
combat  MOS. 

Multiple  Judges  per  MOS,  about  30  on  the  average,  produced  n-rater 
reliabilities  that  are  quite  respectable  (above  .95  for  most  MOSj.  The  high 
Intermethod  correlations  (about  .95  on  the  average)  between  the  construct 
weights  obtained  by  the  direct  estimation  and  conjoint  methods  for  the 
separate  MOS  further  document  the  reliability  of  the  means  of  the  scaled 
weights. 

That  different  groups  of  judges  may  provide  somewhat  different  MOS 
weights  can  be  seen  in  the  relatively  low  correlations  between  the  officer  and 
NCO  weights.  The  NCOs  tended  to  give  relatively  higher  weights  to  the 
Physical  Fitness/Military  Bearing  construct,  while  the  officers  attached  more 
Importance  to  the  Effort/Leadership  construct. 

Though  there  were  statistically  significant  differences  In  the  mean 
weights  assigned  under  the  three  scenarios,  the  very  small  differences  will 
have  little  Impact  on  the  relative  ranking  of  soldiers  on  the  overall  perform¬ 
ance  composites  for  an  MOS.  A  more  critical  question  Is  how  much  Impact  will 
the  weights  themselves  have?  That  Is,  would  a  different  set  of  predictors  be 
selected  using  a  weighted  composite  for  validation  than  would  have  been 
selected  If  the  constructs  had  been  weighted  equally?  And  perhaps,  even  more 
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importantly,  would  different  classification  assignments  be  made  as  a  result  of 
using  the  scaled  weights? 

The  answers  to  these  questions  obviously  depend  not  only  on  the  set  of 
weights  used  but  on  such  factors  as  the  intercorrelations  among  the  construct 
performance  scores,  the  validity  of  the  predictor  battery,  the  amount  of 
differential  prediction  it  affords  across  Army  jobs,  the  HOS  selection 
standards  in  effect,  and  the  assignment  algorithms  employed.  The  most 
feasible  way  to  address  these  Issues  is  through  a  series  of  sensitivity 
ane'yscs  fhat  portray  the  effects  of  these  parameters  on  selection  and 
classification  validity.  These  analyses  remain  to  be  done. 
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Ch«pttr  5 

SCALING  THE  UTILITY  OF  INDIVIDUAL  nUfOKHAUCE 


Finding  t  way  to  place  value  on  different  levels  uf  Job  performance 
across  different  MOS  was  one  of  the  research  objectives  for  Project  A.  Two 
principal  factors  Mde  it  difficult  to  apply  previous  civilian  research  on 
utility  Metrics  and  utility  estinatlon  to  the  Army  context.  First,  compensa¬ 
tion  practices  in  the  Any  are  quite  different  than  in  the  civilian  sector. 
Salaries  do  not  differ  by  MOS  and  thus  cannot  be  used  as  an  index  of  a  job's 
relative  worth  to  the  organization.  Second,  the  Army  is  not  in  business  to 
provide  products  or  services  so  as  to  maximize  profit.  Its  overall  mission  to 
be  prepared  to  defend  the  United  States  against  external  military  threats 
makes  it  inappropriate  to  put  a  monetary  value  on  success  or  failure  or  to 
think  of  the  utility  of  jobs  in  terms  of  their  monetary  benefit.  Thus  dollars 
■ay  not  be  an  appropriate  metric  with  which  to  evaluate  a  new  classification 
system  aimed  at  maximizing  preparedness  for  catastrophic  events.  Neverthe¬ 
less,  military  resources  are  not  unlimited.  Choices  among  alternative 
personnel  practices  will  have  to  be  made,  whether  or  not  there  is  an  explicit 
utility  metric  on  which  to  make  comparisons. 

The  utility  problem  for  Project  A  was  one  of  assigning  utility  values  to 
MOS-by-Performance-Level  combinations.  That  is,  if  it  is  true  that  personnel 
assignments  will  differ  in  value  to  the  Army  depending  on  the  specific  MOS  to 
which  an  assignment  is  made  and  on  the  level  at  which  an  individual  will 
perform  in  that  MOS,  then  a  classification  strategy  that  has  a  validity 
significantly  greater  than  zero  will  increase  in  value  to  the  extent  that  the 
differential  values  (utilities)  can  be  estimated  and  made  a  part  of  the 
assignment  system. 

The  problem  of  estimating  such  utility  values  was  composed  of  a  number 
of  specific  questions:  How  should  performance  levels  be  defined?  Should  it 
be  in  terms  of  general  performance  defined  only  as  relative  level  (e.g., 
percentiles),  with  behavioral  anchors  developed  by  means  of  critical  incident 
methodology?  Or  should  individual  performance  components  be  defined  and  then 
explicitly  weighted  for  combination  into  a  total  score?  What  is  the  most 
appropriate  metric  for  describing  the  relative  value  of  differential  assign¬ 
ments?  Since  the  dollar  metric  seemed  not  to  be  appropriate  for  the  Army 
context,  this  was  a  very  difficult  issue  for  Project  A.  It  required  an 
exploratory  approach. 

What  method(s)  should  be  used  to  estimate  utility?  Only  two  options 
seemed  even  possible.  First,  it  might  be  possible  to  relate  the  performance 
of  individuals  to  some  kind  of  "bottom  line"  measure  that  Army  management 
would  consider  an  appropriate  metric,  such  as  realistic  field  exercises.  The 
difficulties  with  this  approach  revolve  around  feasibility,  expense,  and  the 
necessity  for  equating  scores  in  some  way  across  MOS.  A  second  alternative 
was  to  appeal  to  scaling  technology  and  use  expert  judges  to  estimate  the 
relative  value  of  differential  personnel  assignments,  and  this  is  the  course 
that  was  followed. 

The  general  procedure  used  in  Project  A  to  obtain  utility  values  for 
different  levels  of  predicted  performance  in  each  MOS  was  divided  into  three 
phases.  Phase  one  was  exploratory  in  nature  and  intended  to  uncover  the  major 
issues.  The  goal  of  phase  two  was  to  evaluate  alternative  scaling  methods  and 
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develop  the  procedure  to  be  used.  In  phase  three  the  selected  methods  were 
used  to  obtain  the  final  scale  values.  (See  Sadacca,  White,  Campbell, 

DiFazio,  &  Schultz.  1988.) 

PHASE  ONE:  EXPLORING  ISSUES 

Phase  one  consisted  of  a  series  of  seven  small  group  workshops  with  Army 
officers.  Each  workshop  was  divided  Into  a  period  for  trying  out  prototypic 
Judgment  tasks  and  a  period  for  open-ended  discussicn  of  Issues.  These 
questions  were  used  to  guide  the  discussions: 

(1)  How  shall  measures  of  performance  be  weighted  and  overall 
performance  defined? 

(2)  What  kinds  of  scaling  Judgments  can  officers  reasonably  be 
asked  to  make? 

(3)  Are  there  major  scenario  effects  on  performance  factor 
weights  and  utility  Judgments? 

(4)  In  what  metric  should  the  utility  of  enlisted  personnel 
assignments  be  expressed? 

(5)  What  is  the  form  of  the  relationship  between  performance 

and  utility  within  HOS? 

(6)  Who  will  make  the  best  Judges  for  the  final  scaling? 

The  prototypic  Judgment  tasks  that  were  tried  out  in  phase  one  were  of 
the  following  general  nature: 

(1)  Assignment  of  Importance  weights  to  performance  factors. 

(2)  Rank  ordering  of  overall  utility  of  MOS  x  Performance  Level 
combinations  when  performance  was  defined  in  percentile  terms. 

(3)  Ratio  judgments  of  comparative  utility  for  different  MOS  x 
Performance  Level  combinations. 

The  specific  reactions  of  each  participant  to  the  sample  scaling  tasks 
were  also  used  as  Items  for  general  discussion. 

Perhaps  the  most  significant  finding  was  that  Army  officers  would  be 
willing  and  able  to  assign  differential  utility  values  across  MOS  and  per¬ 
formance  levels.  When  asked  their  reaction  to  expressing  the  differential 
worth  or  utility  of  soldiers  In  terms  of  dollars,  the  officers  in  the  work¬ 
shops  reacted  very  negatively  to  this  concept,  citing  possible  adverse 
political  consequences  as  well  as  Internal  Army  morale  problems  if  dollar 
figures  were  placed  on  soldiers'  worth. 

Perhaps  the  next  most  significant  finding  was  that  fairly  stable  scale 
values  could  be  obtained  from  averaging  across  a  relatively  small  number  of 
of fleer/ judges.  In  these  exploratory  trials  there  was  considerable  agreement 
across  workshops  on  the  scale  values  assigned  to  selected  MOS  x  Performance 
Level  combinations.  Judges  seemed  to  have  a  common  frame  of  reference 
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concerning  what  different  performance  levels  meant;  and,  in  the  absence  of  any 
specification,  everyone  Imposed  the  same  scenario  or  context  (i.e.,  being 
prepared  for  a  major  conflict  in  Europe). 

The  workshop  groups  also  agreed  that  the  scenarlo(s)  used  should  be  free 
of  the  detail  that  suggests  greater  or  less  utility  for  certain  specific  MOS. 
An  acceptable  metric  for  expressing  utilities  of  soldiers  in  wartime  would  be 
the  utility  of  a  50th  percentile  Infantryman  (his  value  for  the  survival  of 
the  unit  and  in  replacing  troop  losses  is  much  more  readily  apparent).  Direc¬ 
tions  to  the  Judges  should  be  reassuring  concerning  inconsistencies  that  may 
occur  in  a  long  series  of  Judgments. 

PHASE  TWOi  EVALUATING  METHODS 

The  second  phase  was  devoted. to  developing  and  evaluating  the  final 
procedures  to  be  used  in  assigning  utilities  to  performance  levels  in  all 
entry-level  HOS.  Several  inferences  were  made  from  the  exploratory  findings 
in  earlier  workshops.  First,  the  apparent  nonlinear  relationships  between 
utility  and  performance  found  in  some  MOS  would  necessitate  obtaining  Judg¬ 
ments  of  the  utility  of  at  least  five  performance  levels  within  each  MOS. 

Five  data  points  would  allow  the  derivation  of  a  best  fitting  utility/ 
performance  curve  with  two  inflection  points  (if  necessary)  within  an  MOS. 
Second,  assigning  utility  scale  values  to  at  least  five  performance  levels 
in  276  MOS  was  much  too  onerous  to  assign  to  any  one  Judge.  Third,  high 
correlations  between  different  methods  suggested  that  a  combination  of  methods 
might  allow  the  total  scaling  task  to  be  accomplished  more  efficiently. 

The  goal  was  to  place  all  276  x  S  MOS/performance  level  combinations 
on  the  same  ratio  scale,  which  would  permit  utilities  to  be  summed  across 
Individual  MOS  assignments  in  comparing  selection/  classification  systems. 
Consequently,  an  additional  12  workshops  were  conducted  with  small  groups  of 
officers  to  try  out  various  scalino  methods.  These  included  rank  ordering, 

R  aired  comparisons,  a  conjoint  scaling  procedure,  the  sorting  or  placement  of 
OS/performance  level  combinations  into  piles  (i.e.,  a  Thurstone  sort),  and 
the  direct  estimation  of  ratio  scale  values  using  a  standard  MOS/performance 
level  set  a'.  100.  Of  these  techniques,  the  last  two  were  the  scaling  proced¬ 
ures  eventually  selected. 

The  rank  orderino  procedure  produced  much  negative  reaction  because  of 
the  time  it  took,  the  inability  to  assign  ties,  and  the  requirement  to  rank 
some  MOS  at  the  very  bottom. 

A  major  change  during  phase  two.  involved  placing  the  Judgments  in  a 
selection  and  classification  context.  That  is,  the  instructions  were  changed 
to  ask  for  Judgments  of  the  utility  of  predicted  performance  of  Army 
applicants  or  recruits  rather  than  actual  performance  of  incumbents  (as  had 
been  the  case  in  earlier  workshops).  The  Judges  were  asked  to  assume  that  the 
performance  percentiles  given  were  accurate  estimates  of  future  on-the-Job 
performance  percentiles  if  the  applicants  or  recruits  were  actually  assigned 
to  the  MOS.  After  this  adjustment  was  made,  none  of  the  Judges  in  subsequent 
workshops  objected  to  the  basic  concept  of  assigning  differential  utilities  to 
various  MOS/performance  levels. 
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Two  variants  of  the  method  of  paired  comparisons  were  also  tried  out 
using  a  limited  number  of  MOS/performance  level  combinations.  However,  the 
methodology  was  time  consuming  and  the  officers  felt  they  should  be  allowed  to 
Indicate  that  some  applicants  should  not  be  selected  at  all.  The  Judgment  was 
subsequently  shifted  from  predicted  performance  levels  of  applicants  to  that 
of  recruits  (selected  applicants),  thereby  eliminating  the  *do  not  select" 
alternative. 

To  divorce  both  troop  strength  and  troop  replacements  from  utility 
Judgments,  Judget  were  told  that  the  field  strength  of  a11  MOS  was  70  percent 
and  that  the  problem  of  compensating  for  troop  losses  was  being  handled  by 
another  part  of  the  assignment  algorithm  and  should  not  enter  into  their 
Judgments. 

A  conjoint  scaling  method  was  also  tried  out  but  the  method  was  much  too 
difficult  and  time  consuming  for  uSe  In  scaling  this  number  of  stimuli. 

One  method  that  did  prove  effective  for  making  large  numbers  of  scaling 
Judgments  was  the  pile  placement  method  In  which  the  Judges  sorted  cards 
containing  MOS/performance  level  combinations  Into  piles,  based  upon  their 
perceived  utility  or  selection  priority.  Seven  piles  of  predicted  performance 
utility  were  used,  ranging  from  negative  through  zero  utility  to  high  utility. 
The  Judges  initially  sorted  135  MOS/performance  level  combinations,  then  210 
combinations,  and  eventually  280  combinations  without  complaining  about  the 
Judgment  burden. 

Likewise,  the  ratio  Judgment  method.  In  which  Judges  evaluated  MOS/ 
performance  level  utilities  In  relationship  to  that  of  a  90th  percentile 
Infantryman,  was  stepped  up  to  60  combinations  without  becoming  burdensome. 

The  one-rater  intraclass  correlation  reliability  estimate  for  the  pile 
placement  procedure  was  .58  and  the  comparable  coefficient  for  the  direct 
ratio  Judgment  was  .65.  These  results  Indicated  that  satisfactory  reliabili¬ 
ties  for  mean  utilities  could  be  obtained  by  both  methods  if  the  means  were 
based  upon  10  or  more  Judges.  The  correlation  between  the  mean  utilities 
assigned  by  the  12  officers  to  the  60  common  combinations,  using  the  two 
methods,  was  .89. 

Considering  all  the  Information  available  from  the  first  and  second 
phase  workshops.  Project  A  staff  decided  to  use  the  pile  placement  and  direct 
ratio  estimation  methods  In  the  final  determination  of  the  utilities  of 
approximately  276  MOS  x  5  performance  levels,  or  1,380  combinations.  The  pile 
placement  method  provided  a  means  of  reliably  scaling  the  utility  of  large 
numbers  of  combinations  on  an  interval  scale  In  a  reasonable  time  period, 
while  the  direct  estimation  method  could  be  used  to  place  a  limited  number  of 
combinations  on  a  ratio  scale  having  a  meaningful  zero  point.  If  a  set  of 
stimuli  (MOS  X  Performance  Level  combinations)  was  scaled  by  both  methods,  the 
data  could  be  used  to  develop  an  algorithm  for  estimating  ratio  scale  values 
from  Interval  scale  values. 

PHASE  THREE:  OBTAINING  A  COMPLETE  SET  OF  UTILITY  ESTIMATES 

The  results  of  the  exploratory  workshops  were  largely  successful. 

Utility  scale  values  varied  across  MOS  In  a  manner  generally  consistent  with 
expectations,  and  inter ludge  agreement  was  high  enough  to  indicate  that  fairly 
stable  scale  values  could  be  ootained  by  averaging  across  officer  judgments. 
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The  next  goal  was  to  assign  a  utility  to  any  predicted  level  of  perfor- 
nance  for  any  entry-level  HOS  such  that  the  values  could  be  used  to  (a)  make 
classification  decisions  and  (b)  assess  the  net  gain  to  the  Army  of  using  new 
selection/classification  procedures. 

Erggctiyirfli 

The  scaling  task  considered  all  HOS  that  required  an  ASVAB  Aptitude  Area 
score  for  assignment;  that  Is,  276  HOS  times  5  levels  or  1,380  MOS/performance 
level  combinations  to  be  Judged  separately.  To  make  the  scaling  task  more 
acceptable  to  the  Judges,  seven  separate  sets  were  used.  The  first  set  of  12 
HOS  times  5  performance  levels,  or  60  combinations,  was  to  be  Judged  by  all 
Judges  as  the  basis  for  a  consnon  scale.  The  remaining  264  HOS  were  grouped 
Into  six  comparable  subsets  of  44  HOS  each.  Each  deck  thus  contained  280 
HOS/performance  level  combinations— 12  common  plus  44  noncommon  HOS  times  5 
performance  levels. 

Smlt-gf  QfftoS 

To  ensure  a  total  sample  of  60  officers  (10  officers  x  6  decks)  utility 
workshops  were  held  at  6  CONUS  Army  posts  and  In  USAREUR.  Altogether,  74 
field  grade  officers  attended  the  workshops— 54  majors,  13  lieutenant 
colonels,  and  7  colonels. 

After  a  brief  overview  of  Project  A,  a  description  of  the  agenda,  and 
completion  of  a  Background  Information  Sheet,  the  leader  discussed  three 
critical  assumptions: 

(1)  The  military  context  Is  a  period  of  heightened  tensions  with  an 
Increasing  probability  that  hostilities  will  break  out  In  Europe, 
Asia,  the  Caribbean,  Latin  America,  and  Africa.  Some  potential 
enemies  have  nuclear  and  chemical  capability  and  air  parity  does 
exist. 

(2)  The  overall  MOS  performance  measure  for  each  HOS  represents 
an  optimally  weighted  combination  of  multiple  performance 
factors. 

(3)  The  predicted  performance  levels  for  the  recruits  are 
accurate.  That  Is,  the  recruits  will  actually  perform  at 
the  predicted  levels. 

For  the  pile  placement  method,  the  Judges  were  to  sort  the  MOS/perfor¬ 
mance  level  combinations  Into  one  of  seven  piles  ranging  from  positive,  to 
zero,  to  negative  utility.  For  the  direct  Judgment  method,  the  participants 
wrote  the  value,  100,  on  the  90th  percentile  Infantryman  card,  and  then 
assigned  a  utility  value  to  each  of  the  remaining  59  MOS/  performance  level 
combinations  so  as  to  establish  a  utility  ratio  using  the  90th  percentile  IIB 
as  the  standard.  Zero  and  negative  utility  values  were  permitted. 
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ReHabiUtv  and  Va1iditv_Anfl.]yses 

After  extensive  outlier  analysis ,  seven  extremely  atypical  judges  were 
removed  and  all  reliability  and  validity  analyses,  as  well  as  the  utility 
value  estimates,  were  based  on  the  remaining  67^  The  n-rater  reliabilities 
for  the  six  separate  decks  (based  on  an  or  about  11  judges  on  the  average) 
ranged  from  .9m  to  .976  for  the  pile  placement  data.  The  A>rater  (67  Judges) 
reliability  for  the  direct  judgment  utilities  of  the  common  combinations  was 
.992.  The  corresponding  reliability  for  the  pile  placements  of  the  common 
combinations  (across  all  decks  and  the  67  Judges)  was  .995.  The  correlation 
obtained  between  the  average  scale  values  from  the  two  methods  across  the  60 
common  combinations  was  .98. 

This  high  correlation  was  not  wholly  attributable  to  judges  simply 
agreeing  that  good  performance  Is  worth  more  than  poorer  performance.  This 
can  be  seen  by  the  correlations  between  average  pile  placement  and  direct 
judgment  utilities  attained  when  the  correlations  are  computed  across  the  12 
common  MOS  holding  the  performance  percentile  constant.  These  correlations 
had  an  average  value  of  .77.  The  n-rater  (67  judges)  reliabilities  averaged 
.89  and  .82  respectively  for  the  pile  placement  and  direct  judgment  utilities, 
when  the  reliabilities  were  computed  for  each  percentile  level  separately. 

Comparison  of_atjlitv  Ratings  bv  Different  Officer  Specialties 

Analyses  were  conducted  to  determine  whether  officers  In  different 
military  primary  specialties  assigned  significantly  different  utilities  to  the 
common  MOS/performance  level  combinations.  In  all,  only  10  of  the  more  than 
250  statistical  tests  run  were  significant  at  the  .05  level.  Examination  of 
the  significant  differences  that  were  obtained  did  not  reveal  any  trend  In  the 
data  Indicating  that  certain  types  of  officers  favored  particular  MOS  or 
performance  levels. 

Elt-lmatOL-Of  Ratio  Scale  Utilities  From  Pile  Placement  (Interval)  Data 

A  basic  objective  of  the  overall  research  design  was  to  place  all  1,380 
MOS/performance  level  combinations  on  the  same  utility  scale.  Using  the 
averages  (across  all  judges)  of  the  direct  judgment  utilities  assigned  the  60 
common  combinations  as  the  dependent  variable,  and  the  pile  placement  of  the 
same  common  combinations  as  the  basic  Independent  variable,  an  equation  was 
derived  expressing  direct  judgment  utilities  as  a  function  of  average  pile 
placement.  This  equation  was  then  used  to  estimate  the  ratio  scale  values 
(direct  judgment  utilities)  for  each  group  of  judges. 

Alternative  regression  equations  as  estimates  of  ratio  scale  utilities 
from  the  pile  placement  data  were  evaluated  on  a  hold-out  sample  of  20  com¬ 
binations.  The  overall  multiple  correlations  were  very  high,  .97  on  the 
average,  although  In  general  the  equations  tended  to  underestimate  the  utili¬ 
ties  of  the  hold-out  combinations  having  high  actual  utilities,  and  slightly 
overestimate  the  utilities  of  the  combinations  having  low  actual  utilities. 

The  best  balance  was  achieved  by  using  average  pile  placement  and  both  Its 
square  and  cube  as  the  Independent  variables.  The  sign  of  the  weights 
obtained  formed  a  fairly  consistent  pattern  with  average  pile  placement  always 
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having  a  positive  weight,  and  the  square  and  cube  of  average  pile  placement 
having  negative  and  positive  weights,  respectively. 

Cross-Validation  of  Estimation  Equations  on  a  Hold-Out  Samite 

The  ten  participants  in  the  last  utility  workshop  were  given  an 
additional  40  combinations  (8  HOS  x  5  levels)  on  which  to  make  their  direct 
Judgments  of  utility.  The  means  cf  the  direct  judgment  utilities  given  these 
40  combinations  were  estimated  by  formulas  derived  for  each  deck,  excluding 
any  of  the  data  obtained  from  the  last  workshop. 

Very  high  correlations  (.97)  were  again  obtained  between  the  utilities 
estimated  from  the  separate  deck  equations  and  the  hold-out  sample  direct 
judgment  utilities.  Moreover,  the  direct  judgment  means,  standard  deviations, 
and  ranges  for  the  40  extra  combinations  obtained  from  the  hold-out  were  quite 
similar  to  those  estimated  from  the  equations. 


THE  FINAL  UTILITY  VALUES 

The  analyses  supported  the  conclusions  that: 

(1)  For  both  methods  the  reliability  of  a  single  judge  is  reasonably 
high. 

(2)  For  both  methods  the  reliability  of  the  average  value  produced  by 
11  judges  or  more  is  very  high. 

(3)  Reliabilities  are  high  even  when  performance  level  is  controlled 
and  differences  are  due  only  to  MOS  differences  within  performance 
level . 

(4)  The  agreement  between  the  two  utility  scaling  methods  is  very  high 
and  equal  to  the  limit  of  their  reliabilities. 

(5)  Judges  from  different  posts  or  MOS  backgrounds  do  not  produce 
different  patterns  of  scale  values. 

(6)  A  relatively  simple  exercise  in  equation  fitting  produced  a  useful 
method  for  estimating  ratio  scale  values  (which  could  not  be 
obtained  for  all  MOS  x  Performance  Level  combinations)  from  the 
interval  scale  values  which  were  obtained  from  all  MOS  x  Perfor¬ 
mance  Level  combinations  using  the  pile  placement  (Thurstone  sort) 
method. 

(7)  As  determined  on  a  cross-validation  sample  of  stimuli,  the  equa¬ 
tions  used  to  estimate  ratio  values  from  interval  data  were  highly 
accurate  (B^i^mm  *  actual  -  .97). 


The  derived  equations  for  each  deck  were  used  to  estimate  the  ratio 
scale  utilities  for  the  noncommon  MOS/performance  level  combinations.  These 
values  represent  the  bottom  line  of  the  Project  A  utility  scaling  work. 

Within  the  limits  of  the  reliability  and  validity  evidence  discussed  here,  the 
1,365  combinations  liave  been  placed  on  the  same  ratio  scale. 
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DISCUSSION  AND  CONCLUSIONS 

The  assianed  utilities  had  very  high  reliabilities  and  the  estimated 
ratio  scale  values  cori^elated  very  highly  with  direct  Judgments.  These 
results  held  even  when  performance  level  was  held  constant.  A  personnel 
assignment  algorithm  that  took  Into  account  the  value  of  performance  would 
most  likely  be  able  to  effect  more  optimal  Army-wide  assignments  than  one  that 
did  not. 

However,  a  number  of  problems  need  to  be  addressed  before  utilities 
similar  to  the  ones  obtained  in  this  research  can  be  used  operationally.  One 
problem  concerns  the  optimal  distribution  within  MOS,  considering  both  within- 
and  between-MOS  utilities  as  well  as  the  available  recruit  pool  and  the 
quality  of  existino  personnel.  This  is  the  Issue  of  average  vs.  marginal 
utility  (Nord  &  White,  1988,  1990).  Another  Issue  concerns  the  duration  of 
time  that  the  recruits  actually  remain  in  the  Army  and  how  to  aggregate  values 
over  time. 

Clearly,  this  research  has  affirmatively  answered  the  question  of 
whether  a  coherent,  reliable  set  of  relative  utility  values  could  be  derived 
for  all  performance  levels  in  all  entry-level  Army  MOS.  The  next  steps 
involve  how  to  make  best  use  of  that  finding  in  improving  the  Army's  selec¬ 
tion,  classif icaticn,  and  assignment  processes. 
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Chapitr  6 

COHPLETICK  OF  LONGITUDINAL  VALIDATION  PREDICTOR  AND 
END-OF-TRAINING  DATA  COLLECTION 


Since  one  goal  of  the  LV  data  collection  Mas  to  administer  the  predic* 
tors  as  closely  as  possible  to  the  point  where  they  would  ultimately  be 
administered  operationally,  testing  during  Reception  Station  processing  was 
chosen  as  the  most  feasible  method  of  obtaining  the  desired  sample.  Soldiers 
In  the  LV  sample  would  then  be  followed  Into  their  first  tour,  where  the 
first-tour  job  performance  measures  would  be  administered,  and  eventually  Into 
their  second  tour,  where  the  second-tour  performance  measures  could  be 
administered.  This  data  collection  process  Is  summarized  schematically  In 
Flgora  6-1. 


LONGmiOINAL  VALIDATION 
Data  Collection  Schedule 


1987- 

L9B8 


1988- 

1989 


1990- 

1991 


Test 

Dates 

CshprtZLRsitlQii 

1986- 

FY86/87 

1987 

Reception 

Experimental  Battery 

Station 

1 

FY86/87 
End  of 
Training 


1 


FY86/87 
1st  Tour 


School  Knowledge  Tests 
Amy-Wide  Ratings 
(Peers  &  Supervisor) 


Hands-On  Tests 
Job/Task  Knowledge  Tests 
Army-Wide  Ratings 
NOS-SpecIfIc  Ratings 


FY86/87 
End  Tour 


2nd  Tour 
Performance 


Figure  ^1.  Loogitudisal  Validation  data  collection  Theme. 
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SAMPLE  AND  SCHEDULE 


The  sample  of  MOS  for  the  Lonoltudlnal  Validation  is  shown  in  Table  6«1. 
Tu  improve  coverage'  of  MOS  job  families,  two  MOS  (29E,  Electronics  Repairer, 
aMd  96B,  Intelligence  Analyst)  had  been  added  to  the  sample  used  for 
Concurrent  Validation  and  one  MOS  (76W,  Petroleum  Supply  Specialist)  had  been 
deleted.  In  addition,  MOS  19K  (Ml  Armor  Crewman)  was  added  because  MOS  19E 
(M60  Armor  Crewman)  was  being  severely  scaled  back.  These  modifications 
resulted  in  an  LV  sample  of  zl  MOS,  compared  to  19  MOS  during  the  CV  phase. 


Table  6-1 

Project  A  HOS  In  Longitudinal  Validation  Sample 

Bllgh.A 

MsLI 

m 

m 

IIB 

Infantryman 

12B 

Combat  Engineer 

13B 

Cannon  Crewman 

16S 

MANPADS  Crevnnan 

19E 

M60  Armor  Crewman 

27E 

Tow/Dragon  Repairer 

19K 

Ml  Armor  Crewman 

29E 

Electronics  Repairer 

31C 

Single  Channel  Radio  Operator 

518 

Carpentry/Masonry 

63B 

Light-Wheel  Vehicle  Mechanic 

Specialist 

711 

Administrative  Specialist 

S4E 

NBC  Specialist* 

88M 

Motor  Transport  Operator* 

S5B 

Ammunition  Specialist 

91A 

Medical  Specialist 

67N 

Utility  Helicopter  Repa 

9SB 

Military  Police 

76Y 

Unit  Supply  Specialist 

94B 

Food  Service  Specialist 

96B 

Intelligence  Analyst 

*  MOS  88M  was  previously  identified  as  MOS  64C. 

*  MOS  54E  subsequently  became  MOS  54B. 

To  obtain  a  large  enough  sample  for  the  extended  testing  involved  in  the 
Longitudinal  Validation,  each  of  the  eight  Reception  Battalions  was  asked  to 
test  all  Regular  Army  soldiers  entering  any  one  of  the  21  MOS  listed  in  Table 
6-1  for  an  entire  year.  Testing  sites  and  data  collection  periods  were  as 
follows: 


Site 

FortsTll 

20  Aug  86 

Fort 

Benning 

27 

Aug 

86 

Fort 

Bliss 

4 

Sep 

86 

Fort 

Knox 

10 

Sep 

86 

Fort 

McClellan 

17 

Sep 

86 

Fort 

Dix 

24 

Sep 

86 

Fort 

Leonard  Wood 

1 

Oct 

86 

Fort 

Jackson 

19 

Nov 

86 

jMtlng  pgrlBd 


20  Aug  87 

-  27  Aug  87 

-  4  Sep  87 

-  10  Sep  87 

-  17  Sep  87 
•  24  Sep  87 

-  1  Oct  87 

-  19  Nov  87 
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THE  EXPERIMENTAL  BATTERY 

Table  6-2  shows  the  complete  array  of  tests  and  Inventories  In  the 
Experimental  Battery,  the  number  of  Items  In  each,  and  the  time  limit  (for  the 
timed  tests)  or  approximate  time  to  finish  (for  the  computer-administered 
tests  and  the  untimed  inventories). 


Table  6-2 

Description  of  Tests  in  Experimental  Battery 


Cognitive  Paper-and-Pencil  Tests 

Time  Limit 
Hvinbgr-gf-Ueins  (minutes) 

Reasoning  Test 

30 

12 

Object  Rotation  Test 

90 

7.5 

Orientation  Test4 

10 

Maze  Test 

24 

5.5 

Map  Test 

20 

12 

Assembling  Objects  Test 

36 

18 

Computer-Admini stared  Tests 

Nvinbgr.bLUgffis, 

Aooroximate  Time 

Demographics 

2 

4 

Reaction  Time  1 

15 

2 

Reaction  Time  2 

30 

3 

Memory  Test 

26 

7 

Target  Tracking  Test  1 

18 

8 

Perceptual  Speed  and  Accuracy  Test 

36 

6 

Target  Tracking  Test  2 

18 

7 

Number  Memory  Test 

28 

10 

Cannon  Shoot  Test  ' 

36 

7 

Target  Identification  Test 

36 

4 

Target  Shoot  Test 

30 

5 

Non-Cognitive  Paper-and-Pencil 

Number  of  Items 

Approximate  Time 

Inventories 

Assessment  of  Background  and  Life 

199 

35 

Experiences  (ABLE) 

Army  Vocational  Interest  Career 

182 

20 

Examination  (AVOICE) 

Job  Orientation  Blank  (JOB) 

31 

5 

173 


The  information  obtained  from  Concurrent  Validation  data  analysis  was 
used  to  rocke  the  final  revisions  to  the  predictor  battery  for  the  Longitudinal 
Validation.  Since  the  battery  had  already  been  through  several  iterations  of 
data  collection,  analysis,  and  revision,  the  revisions  were  not  substantial. 

Of  the  six  cognitive  tests,  only  one  had  actual  item  content  change. 

The  Assembling  Objects  test  was  made  mere  difficult  by  adding  four  new  uems 
and  revising  three  existing  items;  two  minutes  were  added  to  the  time  limit. 
For  the  computerized  portion  of  the  battery,  minor  modifications  were  made  to 
the  instructions,  several  changes  were  made  in  the  software,  and  several  items 
on  the  Target  Identification  Test  were  revised  to  balance  the  item  types 
better. 

The  ABLE  revisions  included  deleting  10  items,  revising  16  items,  and 
using  a  separate  answer  sheet  for  responding.  For  the  AVOICE,  several  changes 
were  made  in  the  scoring  procedures,  switching  already  existing  items  to 
scales  where  their  item-total  score  correlations  were  higher,  and  in  two  cases 
combining  t/to  pre-existing  scales.  Ten  items  were  dropped  from  the  AVOICE,  16 
were  added,  several  scales  were  renamed,  and  a  separate  answer  sheet  was 
prepared.  The  JOB  was  shortened  by  seven  items  and  had  five  items  reworded, 
and  all  scales  were  reconstituted  and  renamed,  based  on  factor  analyses  of  the 
CV  data.  A  list  of  the  scales  on  all  three  non-cognitive  inventories  appears 
as  Table  6-3. 


TRAINING  PERFORMANCE  MEASURES 

As  part  of  the  Longitudinal  Validation,  criterion  measures  of  training 
performance  were  collected  on  each  individual  at  the  end  of  AIT  or  OSUT.  The 
end-of-training  measures  were  administered  to  soldiers  at  the  eight  predictor 
testing  installations  and  at  six  other  AIT-only  installations  where  the 
Project  A  MOS  were  trained.  These  14  installations,  the  MOS  tested  at  each, 
and  the  data  collection  period  for  each  are  shown  in  Table  0-4. 

The  training  measures  consisted  of  a  number  of  rating  scale  evaluations 
collected  from  the  individual's  Drill  Instructor  and  the  training  achievement 
test  previously  developed  for  each  MOS. 

The  development  and  field  testing  of  the  paper-and-pencil  achievement 
tests  were  described  In  the  FY85  Annual  Report  (Campbell,  1937a)  and  ir.  Davis, 
et  a1.  (1986).  The  rating  scale  measures  were  modified  versions  of  the  Army- 
wide  BARS  scales  used  as  job  performance  measures  (Pulakos  &  Borman,  1986). 

The  following  scales  were  used: 

A.  Technical  Knowledge/Skill 

B.  Effort 

C.  Following  Regulations  and  Orders 

D.  Military  Appearance 

E.  Physical  Fitness 

F.  Self-Control 

G.  Leadership  Potential 
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Tab1«  6-3 

ABLE.  AVOICE,  and  JOB  Scalas  In  ExparlMntal  Battary 


Adjustment:  Emotional  Stability 

Dependability;  Nondelinquency 

Traditional  Values 
Conscientiousness 

Achievement:  Work  Orientation 
Self-Esteem 

Surgency  (Leadership/Potency):  Dominance 

Energy  Level 

Agreeableness/Likabi 1 ity :  Cooperativeness 
Locus  of  Control:  Internal  Control 
Physical  Condition:  Physical  Condition 

Response  Validity  Scales:  Unlikely  Virtues  (Social  Desirability) 

Self-Knowledge 
Non-Random  Response 
Poor  Impression 

AVOICS 

Realistic:  Mechanics 

Heavy  Construction 
Electronic  Communication 
Drafting 
Law  Enforcement 

Conventional;  Clerical/Administrative 
Warehousing/Shipping 

Social  &  Enterprising:  Leadership/Guidance 

Investigative:  Medical  Services 
Mathematics 

Artistic:  Aesthetics 

JOB  Scales 

Job  Pride 
Job  Security 
Serving  Others 


Fire  Protection 
Audiographics 
Rugged  Individualism 
Firearms  Enthusiast 
Combat  Vehicle  Operator 

Food  Service— Professional 
Food  Service— Professional 


Science/Chemical 

Computers 


Job  Autonomy 
Job  Routine 
Ambition 


175 


Tabit  6-4 

End-of  Training  Data  Collactlon  Sitts  and  Data  Collactlon  Ptrlod 


Sita 

HQS 

End-ol 

•clrBlninq  JssUnfl.. 

Period 

Fort 

Sin 

13B 

15 

Nov 

86 

21 

Nov 

87 

Fort 

Benning 

IIB 

12 

Nov 

86 

4 

Dec 

87 

Fort 

Bliss 

16S 

8 

Jai) 

87 

22 

Jan 

88 

Fori 

Knox 

19E 

6 

Dec 

86 

12 

Dec 

87 

19K 

16 

Dec 

86 

12 

Dec 

87 

Fort 

McClellan 

648 

28 

Mar 

87 

16 

Apr 

88 

95B 

24 

Jan 

87 

16 

Jan 

88 

Fort 

Dix 

63B 

7 

Mar 

87 

27 

Feb 

88 

88M 

24 

Jan 

87 

23 

Jan 

88 

94B 

7 

Feb 

87 

4 

Feb 

88 

Fort 

Leonard  Wood 

12B 

17 

Jan 

87 

9 

Jan 

88 

516 

31 

Jan 

87 

23 

Jan 

88 

63B 

7 

Mar 

87 

6 

Feb 

88 

esM 

14 

Feb 

87 

30 

Mar 

88 

Fort 

Jackson 

63B 

2 

May 

87 

16 

Apr 

88 

71L 

15 

Apr 

87 

6 

Apr 

88 

76Y 

28 

Mar 

87 

2 

Apr 

88 

94B 

18 

Apr 

87 

2 

Apr 

86 

Redstone  Arsenal 

27E 

10 

Mar 

87 

21 

Apr 

88 

55B 

11 

Dec 

86 

3 

Mar 

88 

Fort 

Lee 

;6Y 

10 

Jan 

87 

17 

Feb 

88 

94B 

10 

Jan 

87 

18 

Feb 

88 

Fort 

Rucker 

67N 

17 

Jd!) 

87 

13 

Feb 

88 

Fort 

Sani  Houston 

91A 

19 

Feb 

87 

30 

Mar 

88 

Fort 

Gordon 

29E 

27 

Apr 

87 

14 

Apr 

88 

31C 

13 

Feb 

87 

18 

Apr 

88 

Fort 

Huachuca 

966 

14 

Apr 

87 

11 

Apr 

88 
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DATA  COLLECTION  PROCEDURES 

Considerable  time  and  effort  were  spent  initiating,  designing,  coor¬ 
dinating,  and  monitoring  the  LV  predictor  and  training  criteria  data  collec¬ 
tion.  Numerous  briefings  were  conducted  at  various  points  down  the  chain  of 
command,  culminating  In  several  meetings  with  the  POC  at  each  of  the  eight 
Reception  Battalion  sites,  several  months  prior  to  data  collection  at  that 
site.  From  this  point  until  testing  began,  coordination  was  taken  over  by  the 
POC,  who  was  responsible  for  providing  the  required  troops,  space,  and 
necessary  equipment.  The  two  primary  challenges  In  preparing  each  site  were 
(a)  fitting  4  hours  of  testing  into  an  already  demanding  72-riour  processing 
schedule,  and  (b)  obtaining  adequate  space  for  testing,  that  met  good  testing 
standards,  every  day  for  a  full  year. 

A  test  site  manager  (TSM)  was  hired  to  be  In  charge  of  each  data 
collection  site,  and  was  supported  by  from  one  to  as  many  as  eight  test 
adminisli ators.  Applications  were  taken  by  mall  for  both  positions,  and  all 
Initial  Interviewing  and  hiring  was  done  on  site  by  experienced  Project  A 
staff.  Detailed  test  administration  manuals  were  prepared  and  used  as  the 
basis  for  a  one-week  training  course,  conducted  at  each  site  for  the  newly 
hired  personnel.  Also,  scripts  were  prepared  for  administering  each  test  or 
Inventory  and  test  site  personnel  were  trained  In  their  use  as  well  as  In 
handling  questions. 

Each  week  the  TSM  called  the  Project  A  staff  person  In  charge  of  the 
data  collection  and  reported  the  number  of  soldiers  tested  the  prior  week, 
discussed  any  questions  or  problems  he  or  she  had,  and  received  relevant  news 
or  Instructions.  In  addition,  each  site  was  required  to  submit  monthly 
written  reports  of  their  testing  progress  and  documentation  of  any  problems 
that  had  occurred  or  events  that  may  have  had  an  Impact  on  test  results. 

Finally,  Project  A  contractor  or  ARI  staff  visited  each  site  from  one  to 
three  times  to  monitor  the  test  administration,  provide  feedback  where 
appropriate,  and  go  over  questions  or  unresolved  problems. 


SAMPLE  SIZES 
Predictor  Data 

The  final  sample  sizes  for  the  Longitudinal  Validation  predictor  data 
collection  are  shown  In  Tables  6-5,  6-6,  and  6-7.  Table  6-5  shows  the  number 
of  soldiers  at  each  of  the  reception  battalions  who  took  at  least  one  of  the 
five  components  of  the  predictor  battery:  computer,  spatial  (paper-and-pencil 
cognitive),  ABLE,  AVOICE,  and  JOB.  As  the  table  shows,  49,397  soldiers 
participated  In  the  administration  of  the  predictor  battery.  Fort  Benning  had 
the  largest  percentage  of  the  sample,  with  28.7  percent,  followed  by  Fort 
Jackson  with  17.6  percent.  Forts  McClellan,  Leonard  Wood,  and  Sill  were  next 
with  11.9,  11.5,  and  10.3  percent,  respectively.  Forts  Dix  and  Knox  were 
next,  with  8.4  percent  and  7.8  percent,  respectively,  while  Fort  Bliss  had  the 
smallest  percentage,  3.7. 
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Tabla  6-5 

Longitudinal  Validation:  Pradlctor  Data  Collactad  at  Each  Racaptlon  Battalion 


Poat 

Fort  Banning 

Fraouancv 

Percent 

Cuaulativa 

Fraouancv 

Cuaulativa 

Percent 

14.188 

28.7 

14,188 

28.7 

Fort  BUss 

1.842 

3.7 

16,030 

32.5 

Fort  01 X 

4.160 

8.4 

20,190 

40.9 

Fort  Jackson 

8.700 

17.6 

28,890 

58.5 

Fort  Knox 

3.857 

7.8 

32,747 

66.3 

Fort  McClellan 

5.885 

5.067 

11.9 

38,632 

78.2 

Fort  Sill 

10.3 

43,699 

88.5 

Fort  Leonard  Wood 

5,698 

11.5 

49.397 

100.0 

Tabla  6-6 


Longitudinal  Validation:  Pradlctor  Data  Collactad  by  MOS 


MOS 

Friouincv 

Percent 

Cuaailatlva 

Fraouancv 

Cuaulativa 

percent 

118 

14,193 

28.7 

14,257 

28.9 

12B 

2,118 

4.3 

16,375 

33.1 

138 

5,087 

10.3 

21,462 

43.4 

16S 

800 

1.6 

22,262 

45.1 

19E 

583 

1.2 

22,845 

46.2 

19K 

1,849 

3.7 

24,694 

50.0 

27E 

139 

0.3 

24,833 

50.3 

29E 

257 

0.5 

25,090 

50.8 

31C 

1,072 

2.2 

26,162 

53.0 

518 

455 

0.9 

26,617 

53.5 

548 

967 

2.0 

27,584 

55.8 

558 

482 

1.0 

28,066 

56.8 

638 

2,241 

4.5 

30,307 

61.4 

67N 

334 

0.7 

32,234 

65.3 

71L 

2,140 

4.3 

34,374 

69.6 

76Y 

2,756 

5.6 

37,130 

75.2 

88M 

1,593 

3.2 

31,900 

64.6 

91A 

4,219 

6.5 

41,349 

83.7 

948 

3,522 

7.1 

44,871 

90.8 

958 

4,206 

8.5 

49,077 

99.4 

968 

320 

0.6 

49,397 

100.0 

Unk 

64 

0.1 

64 

0.1 
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This  sample  is  broken  down  by  MOS  in  Table  6-6.  The  IIB  is  by  far  the 
largest  MOS  in  the  sample,  with  14,193  soldiers  representing  28.7  percent  of 
the  total.  Next  is  13B  (5,087,  10.3  percent),  followed  by  91A  with  4,219  and 
956  with  4,206  (each  about  8.5  percent).  The  four  least  populous  MOS  in  the 
sample  are:  27E  with  139  (0.3  percent),  29E  with  257  (0.5  percent),  968  with 
320  (0.6  percent),  and  67N  with  334  (0.7  percent).  The  MOS  for  64  soldiers  in 
the  sample  remains  unknown  at  this  time. 

Table  6-7  displays  the  predictor  administration  by  reception  battalion 
and  by  MOS,  and  also  provides  information  on  the  extent  of  complete  versus 
partial  data.  A  soldier  Is  counted  as  "partial"  if  one  or  more  of  the  five 
predictor  battery  components  is  missing.  For  the  total  sample,  37,434 
soldiers  (75.8  percent)  had  complete  data  —  that  Is,  a  record  for  all  five 
components  of  the  predictor  battery  for  each  individual.  To  accommodate  the 
large  number  of  soldiers  being  processed  at  any  one  time  at  Fort  Banning,  the 
predictor  administration  was  set  up  to  administer  the  computer  component  of 
the  battery  to  only  about  one-third  of  the  soldiers  who  came  through  the 
reception  battalion.  If  the  9,884  soldiers  at  Fort  Banning  who  did  not  take 
the  computer  component  are  excluded,  the  percentage  of  soldiers  on  whom  we 
have  complete  data  Increases  to  94,7. 

End-of-Trainino  Data 

The  final  sample  sizes  for  the  Longitudinal  Validation  end-of-training 
data  collection  are  shown  by  post  and  by  MOS  in  Table  6-8.  The  number  who 
took  the  end-of-training  measures  is  shown  by  whether  a  soldier  took  the 
trainino  achievement  test  (K3),  the  rating  scales  (R),  or  both  (BOTH). 
Virtually  all  soldiers  took  both  parts.  Table  6-8  shows  that  33,863  soldiers 
out  of  34,305  (96.7V)  took  both  end-of-training  measures. 

Both  Predictor  and  EOT  Data 

Table  6-9  compares  the  number  of  soldiers,  by  MOS,  for  whom  there  are 
both  predictor  and  end-of-training  data.  These  are  the  samples  that  were 
followed  up  with  the  first-tour  performance  measures.  Of  the  49,397  soldiers 
having  predictor  data,  34,305  (69V)  also  have  end-of-training  data.  The 
percentage  by  MOS  ranges  from  50  to  92. 
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Tabit  6>9 


Longitudinal  Validation:  Comparison  of  Soldiers  with  Predictor  Data 
Who  Also  Have  End-of-Training  Data,  by  MOS 


Percent 

With 

Predictor 

and 

Predictor  Data _  End-of-Trainino  Data  EOT  Data* 


m. 

CoTOlete 

Total 

O 

JL. 

Both 

MaI 

IIB 

4,308 

9,865 

14,193 

277 

16 

7,602 

8,097 

.  57 

12B 

2,092 

26 

2,118 

4 

1,857 

1,861 

88 

13B 

4,835 

252 

5,087 

17 

5 

4,655 

4,677 

92 

16S 

781 

19 

800 

3 

578 

581 

73 

19E 

578 

5 

583 

1 

443 

444 

76 

19K 

1,806 

41 

1,849 

3 

1,592 

1,595 

86 

27E 

138 

1 

139 

1 

91 

92 

66 

29E 

212 

45 

257 

139 

139 

54 

31C 

956 

116 

1,072 

10 

652 

662 

62 

51B 

441 

14 

455 

349 

349 

77 

54B 

881 

86 

967 

2 

589 

591 

61 

55B 

462 

20 

482 

1 

384 

385 

80 

63B 

2,094 

147 

2.241 

12 

1 

1,162 

1,175 

52 

67N 

328 

6 

334 

10 

1 

221 

232 

69 

71L 

1,905 

235 

2,140 

3 

1,402 

1,405 

66 

76Y 

2,475 

281 

2,756 

11 

2 

1,622 

1,635 

59 

asM 

1,494 

99 

1,593 

11 

3 

1.2o0 

1,264 

79 

91A 

3,935 

284 

4,219 

10 

1 

3,164 

3,175 

75 

94B 

3,279 

243 

3,522 

23 

1 

1,720 

1,744 

50 

95B 

4,101 

102 

4,203 

6 

3,580 

3,586 

85 

96B 

281 

39 

320 

3 

188 

191 

60 

UNK 

47 

17 

64 

2 

426 

428 

Total 

37,434 

11,963 

49,397 

410 

32 

33,863 

34,305 

69 

*  Computed  as  total  end-of-training  data  divided  by  total  predictor  data. 
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Chapter  7 

REVISION  OF  FIRST-TOUR  JOB  PERFORHANCE  MEASURES 


The  first-tour  LV  criterion  measures  were  the  same  as  those  used  for  the 
Concurrent  Validation,  except  that  they  were  updated  as  described  In  this 
chapter.  The  3-year  time  period  between  the  Concurrent  Validation  and  the 
Longitudinal  Validation  raised  the  Issue  that  some  criterion  content  might  be 
outdated.  Equipment  and/or  procedural  changes  would  require  test  revisions; 
changes  In  MOS  responsibilities  had  the  potential  of  making  seme  tasks 
obsolete. 

Project  staff  Identified  relevant  changes  so  that  the  appropriate 
revisions  could  be  made.  In  a  few  cases  where  an  entire  task  was  obsolete, 
the  task  was  dropped  without  replacement.  In  many  cases,  revisions  were 
simply  a  matter  of  replacing  outdated  terminology.  Updated  criterion  measures 
were  forwarded  to  the  MOS  proponents  for  a  currency  review  and  additional 
revisions  were  made  on  the  basis  of  this  review. 

M-on  Measures.  Lessons  learned  from  the  Concurrent  Validation 
e  use  of  a  different  format  for  the  hands-on  test  sheets.  An 
overall  effectiveness  rating  for  performance  on  each  task  (on  a  scale  of  1  to 
7)  was  added  at  the  end  of  each  task  score  sheet  for  hands-on  tests  In  the 
expectation  that  It  would  provide  unique  task  performance  Information. 

After  a  search  for  additional  first-tour  measures  that  would  have 
relevance  for  combat  readiness,  a  computer- simulated  M16  rifle  marksmanship 
task,  the  Multipurpose  Arcade  Combat  Simulator  (MACS),  originally  developed 
for  application  as  a  training  aid  was  selected.  Using  a  demilitarized  M16 
rifle,  the  soldier  “shoots"  at  targets  displayed  on  a  computer  monitor. 
Attached  to  the  barrel  of  the  rifle  Is  a  light  pen  which  simulates  the  path  of 
the  rounds  and  the  screen  displays  a  total  of  30  targets,  some  moving  and  some 
stationary.  Using  the  MACS,  a  test  of  “Engage  targets  with  an  M16"  was  added 
to  the  criterion  measures  for  two  MOS,  IIB  and  95B. 

Rating  Scales.  The  time  period  between  the  two  data  collections  was 
crucial  for  MOS  19E  (M60  Armor  Crewman)  because  this  MOS  was  being  severely 
scaled  back  as  MOS  19K  (Ml  Armor  Crewman)  was  being  phased  In.  The  two  differ 
with  respect  to  the  kind  of  tank  (M60  or  Ml)  that  the  soldiers  operate.  To 
deal  with  the  transition,  a  job  analysis  of  19K  was  conducted  and  a  complete 
set  of  criterion  measures  was  developed  specifically  for  this  new  MOS.  The 
same  procedures  used  for  the  other  MOS  (Campbell,  1987b)  werr  followed,  with 
one  exception:  The  19K  MOS-specIfIc  rating  scales  were  developed  by  SMEs 
from  the  Armor  School  and  by  19E  NCOs.  Because  of  the  19E/K  split,  the  Longi¬ 
tudinal  Validation  data  collection  Included  10  MOS  in  Batch  A  rather  than 
nine. 


While  there  was  considerable  Interest  in  keeping  the  Combat  Performance 
Prediction  scale,  project  staff  and  the  Scientific  Advisory  Group  agreed  that 
the  version  used  In  the  Concurrent  Validation  was  too  lengthy.  Two  alterna¬ 
tives  were  considered.  The  first  was  simply  to  reduce  the  number  of  Items  ^n 
the  original  summated  rating  scale  of  40  Items.  The  second  was  to  reduce  ♦■he 
specific  behavioral  Items  to  summary  dimensions.  Three  dimensions  were 
derived  through  empirical  and  rational  analysis,  and  the  new  scales  were  field 
tested  In  conjunction  with  the  second-tour  criterion  measure  field  tests.  Low 
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reliability  estimates  for  the  dimensional  ratings  led  to  the  decision  to 
retain  the  original  summated  scale  format,  but  the  total  number  of  items  in 
that  summated  scale  was  reduced  from  40  to  19. 

The  final  set  of  Combat  Prediction  Scale  items  was  selected  by  consider¬ 
ing  interrater  reliability,  internal  consistency,  and  content  coverage.  That 
is,  items  were  dropped  if  their  content  was  covered  in  another  item  whose 
reliabilities  were  higher,  or  if  their  content  was  specifically  technical  and 
therefore  covered  by  another  measure,  such  as  a  hands-on  test  or  a  rating 
dimension.  Three  of  the  original  items  were  deleted  because  SMEs  indicated 
that  the  items  were  not  meaningful .  The  SMEs  were  field  grade  officers  and 
senior  NCOs  with  combat  or  tactical  field  exercise  experience.  Another  change 
from  the  CV  version  was  to  u^e  a  less  cumbersome  7-point  scale  rather  than  a 
15-point  scale. 

Personnel  File  Fom.  The  self-report  form  for  gathering  information  on 
administrative  records  (the  Personnel  File  Form)  was  updated  by  reviewing  its 
contents  with  officers  and  NCOs  who  were  representatives  of  the  Army's 
military  personnel  center.  The  form  was  revised  to  allow  soldiers  to  report 
administrative  actions  by  pay  grade,  and  to  report  the  data  of  their  last  M16 
qualification. 

Army  Job  Satisfaction  Questionnaire.  A  new  measure  developed  by  the  ARI 
staff  was  the  Army  Oob  Satisfaction  Questionnaire.  It  was  intended  to  provide 
information  that  would  be  potentially  useful  for  predicting  attrition  and  for 
understanding  the  relationship  of  job  satisfaction  with  other  constructs 
investigated.  The  satisfaction  measure  was  developed  in  several  stages. 

First,  a  number  of  job  satisfaction  dimensions  of  relevance  to  the  Army  were 
identified  through  an  extensive  search  of  the  literature.  Second,  items  were 
written  to  tap  each  of  these  dimensions.  Items  were  also  written  to  elicit 
background  information  that  would  help  clarify  the  respondent's  frame  of 
reference  with  respect  to  his  or  her  perceived  satisfaction  levels  (e*g., 
reasons  for  enlisting). 

The  draft  questionnaire  was  administered  to  the  examinees  in  the  second- 
tour  criterion  measure  field  tests.  The  Minnesota  Satisfaction  Questionnaire 
(MSQ  short  form)  was  also  administered  as  a  marker  instrument.  The  final  set 
of  18  satisfaction  items  was  selected  based  on  reliability  and  meaningfulness 
of  the  factor  structure  of  the  total  set.  These  items  assess  six  aspects  of 
job  satisfaction  (supervision,  co-workers,  promotions,  pay,  work,  and  Army). 
Thirteen  "frame  of  reference"  items  were  selected  for  Inclusion  on  the  final 
questionnaire. 

Deleted  Measures.  Four  measures  were  deleted  from  the  array  of  Batch  A 
first-tour  criterion  measures  used  during  the  Concurrent  Validation.  The 
ratings  of  performance  on  the  15  tasks  selected  for  hands-on  testing  in  each 
MOS  were  eliminated  from  the  MOS-specific  performance  rating  scales  because 
they  were  not  sufficiently  reliable.  The  common  task  ratings  from  the  Army¬ 
wide  rating  scales  were  deleted  for  the  same  reason.  Two  auxiliary  measures 
deleted  were  the  Measurement  Method  Rating  and  the  Army  Work  Environment 
Questionnaire. 

Batch  Z  MQS.  With  respect  to  the  Batch  Z  MOS,  the  school  knowledge 
tests  had  been  submitted  to  a  currency  review  just  prior  to  the  Longitudinal 
Validation  predictor  (including  training  performance)  data  collection.  A 
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second  currency  review  for  the  criterion  data  collection  was  considered 
neither  necessary  nor  practical.  In  the  currency  review,  the  item  pool  for 
each  MOS  was  submitted  to  the  final  authority  for  doctrine  on  that  MOS,  the 
TRADOC  proponent,  for  review  and  approval.  Proponents  were  free  to  recommend 
deletions,  additions,  and  modifications  to  the  test  items. 

Table  7-1  lists  the  final  array  of  treasures  and  supplemental  information 
that  were  administered  to  and  gathereo  from  first-tour  examinees  during  the 
Longitudinal  Validation  criterion  data  collection. 


Table  7-1 

First-Tour  Measures  and  Supplemental  Information  Administered  to  and  Gathered 
From  LV  Sample 


Batch  A:  Personnel  File  Forti'. 

Army-Wide  Performance  Rating  Scales 
MOS-Specific  Rating  Scales 
Combat  Performance  Prediction  Scale 
Hands-on  Tests 
Job  Knowledge  Tests 

Batch  2;  Personnel  File  Form 

Armv-Wide  Performance  Rating  Scales 
Combat  Performance  Prediction  Scale 
School  Knowledge  Tests 

Supplemental  Information  (Both  Batch  A  and  Batch  Z); 

Background  Information  Form 
Army  Job  Satisfaction  Questionnaire 
Job  History  Questionnaire 
Physical  Requirements  Surve/ 


*  Non-Project  A  measure  administered  in  conjunction  with  this  data  collection 
effort. 
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Chapter  8 

ANALYSIS  FOR  SECOND-TOUR  JOB  PERFORMANCE  MEASURES 


The  Project  A  research  plan  called  for  the  development  of  NCO  job 
performance  measures  which  could  be  used  in  a  second-tour  follow-up  of  two 
accession  cohorts  (FY83/84  and  FY86/87)  for  purposes  of  determining  selec- 
tlon/classification/prorootion  strategies  for  NCOs.  To  develop  strategies  for 
identifying  NCO  potential,  measures  of  second-tour  job  performance  are  needed. 
After  the  criteria  are  available,  the  following  questions  could  be  examined: 

To  what  extent  does  the  Experimental  Predictor  Battery  predict  performance 
beyond  the  first  term  of  enlistment?  Does  early  performance  predict  later 
performance,  when  additional  responsibilities  such  as  supervision  and  leader¬ 
ship  are  presumably  required?  What  is  the  optimal  combination  of  selec¬ 
tion/classification  test  information  and  first-tour  performance  data  for 
predicting  second-tour  performance?  How  does  entry-level  training  performance 
relate  to  later  first-tour  and  second-tour  job  performance? 

Over  the  life  cycle  of  Project  A,  the  full  round  of  the  data  collections 
and  analyses  necessary  to  answer  these  critical  questions  could  not  be 
completed.  However,  the  required  job  analysis  was  completed  and  the  criterion 
development  work  was  begun. 


JOB  ANALYSIS  FOR  SECOND  TOUR 
The  specific  goals  of  the  job-analytic  work  were  to: 

4  Describe  the  major  differences  between  entry-level  and  higher 
level  performance  content,  within  MOS. 

•  Describe  the  major  differences  across  MOS,  within  higher  level 
jobs. 

•  Describe  the  specific  nature  of  the  supervisory/ leadership 
component  of  these  higher  level  jobs. 

Once  these  objectives  were  achieved,  the  information  would  be  used  to 
address  four  questions: 

•  What  should  be  the  content  of  the  new  criterion  measures? 

•  What  kinds  of  measurement  methods  are  needed? 

•  Are  separate  measures  needed  for  each  job?  Or  are  the  jobs  so 
similar  that  the  same  measures  can  be  applied  to  all? 

•  To  what  extent  can  measures  developed  for  entry-level  soldiers  be 
used  among  higher  level  soldiers? 
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Preceding  Page  Blank 


The  second-tour  samples  were  to  be  taken  from  the  nine  MOS  in  Batch  A 
and  were  intended  to  be  subsamples  of  the  FY83/84  and  FY86/87  validation 
samples.  The  term  "second  tour"  was  used  by  Project  A  to  designate  soldiers 
who  have  been  in  the  Army  between  3  and  5  years.  Paygrade  will  vary  from  one 
MOS  to  another  because  of  differences  in  density  and  promotion  needs  of  the 
Army.  Projections  indicated  that  the  proportion  who  would  be  ESs  would  be 
between  20  end  70  percent  across  MOS.  Most  others  would  be  E4s;  a  very  few 
would  be  E6s. 


During  FY85,  4,930  soldiers  who  entered  the  nine  Batch  A  MOS  during 
the  FY83/84  window  were  tested  in  the  Concurrent  Validation  sample  on  the 
predictor  battery,  training  tests,  and  first-tour  criterion  measures.  This 
sample  forms  the  basis  of  the  CVII  follow-up.  For  the  second  longitudinal 
follow-up,  LVll,  the  cohort  that  entered  the  Army  in  FY86/87  and  the  samples 
that  were  tested  on  the  Experimental  Battery  and  the  training  knowledge  and 
performance  measures  can  be  followed  into  their  second  tour  and  measured  on 
the  job  performance  criterion  measures.  These  samples  are  described  in 
Chapter  6  of  this  report. 

SECOND-TOUR  JOB  ANALYSIS  METHODS 


By  Army  policy\  all  soldiers  at  a  higher  skill  level  are  responsible 
for  being  able  to  perform  all  tasks  at  each  lower  skill  level,  as  well  as  the 
tasks  at  their  current  skill  level.  Consequently,  the  first-tour  job  analyses 
were  used  as  a  starting  point  and  additional  job  analysis  information  was 
collected  to  describe  the  second-tour  changes.  In  addition,  the  issue  of 
leadership/supervision  performance  was  of  special  concern. 

To  capture  both  the  technical  and  the  supervisory  aspects  of  an  MOS, 
four  methods  of  Job  analysis  were  used:  task  analysis,  a  standardized 
questionnaire  measure  of  supervisory  and  leadership  responsibilities,  critical 
incident  analysis,  and  interviews  with  small  groups  of  senior  NCOs. 


Task-Based  Job  Analysis 


Specification  of  the  population  of  second-tour  technical  tasks 
proceeded  as  for  first-tour  analysis,  by  combining  information  from  the 
Soldier's  Manuals  for  each  MOS  (a  Soldier's  Manual  is  prepared  by  the  propon¬ 
ent  agency  for  every  skill  level  within  an  MOS)  and  data  from  the  Army 
Occupational  Survey  Programs.  After  being  edited  for  redundancies  and  level 

matched  with  Soldier's  Manual 
for  that  MOS.  The  proponent  Army 
completeness  and  accuracy. 


uccupaiionai  survey  programs.  ATcer  oeing  « 
of  generality,  AOSP  items  that  could  not  be 
tasks  were  added  to  the  population  of  tasks 
agencies  then  reviewed  the  list  for  complete 


The  total  task  domains  for  the  nine  MOS  ranged  between  1S3  and  409 
tasks  each,  with  an  average  of  260.  To  aid  in  the  selection  of  a  representa¬ 
tive  sample  of  critical  tasks  for  criterion  measurement,  judgments  of  task 
criticality  and  performance  difficulty  were  then  obtained  from  15  officers/ 


*Army  Regulation  611-201,  Enlisted  Career  Management  Fields  and  Military 
Occupational  Specialties. 


190 


SMEs  who  had  racent  field  experience  supervising  E5s.  The  officers  and  SMEs 
were  obtained  through  the  ARl  troop  support  request  (TSR)  process.  The  grade, 
MOS,  and  experience  criteria  for  the  officers  and  S.MEs  are  laid  out  in  the  TSR 
which  is  then  distributed  to  the  appropriate  installation  for  action.  The 
assigned  point>of -contact  worics  with  a  member  of  the  project  staff  to  iron  out 
the  specific  details  of  the  data  collection,  including  secondary  and  tertiary 
criteria  for  SHE  selection. 

Also,  task  clusters  were  developed  for  the  second  tour  by  using  the 
first-tour  clusters  as  the  starting  point.  That  is,  the  new  second-tour  tasks 
were  sorted  into  these  same  clusters  by  the  project  staff.  Where  no  clusters 
of  first-tour  tasks  were  similar  to  the  new  second-tour  tasks,  new  clusters 
were  formed. 

A  Standardized  Description  of  the  Supervisory  Components 
of  Second-Tour  MQS 

At  the  same  time  that  the  technical  task  descriptions  were  being 
developed  for  each  MOS,  work  was  also  proceeding  on  a  standardized  description 
of  supervisory/leadership  activities.  The  item  content  was  derived  from  two 
instruments  previously  developed  by  ARI  researchers:  the  Supervisory  Respon¬ 
sibility  Questionnaire,  a  34-item  instrument  based  on  critical  incidents 
describing  effective  and  ineffective  NCO  leader  behavior*  (White,  Gast,  & 
Rumsey,  1986);  and  a  very  comprehensive  questionnaire  checklist,  the  Leader 
Requirements  Survey,  which  contained  450  items  and  was  designed  to  describe 
supervisory/ leadership  activities  at  all  NCO  and  officer  ranks.  Both  instru¬ 
ments  were  based  on  extensive  development  work  and  took  advantage  of  the  large 
pool  of  literature  on  leader/supervisor  behavior  (Gast,  Campbell,  Steinberg, 

&  McGarvey,  1987). 


Both  questionnaires  were  administered  to  NCOs  in  the  nine  Jobs. 
Approximately  50  NCOs  received  the  Leader  Requirements  Survey,  and  125  NCOs 
received  the  Supervisory  Responsibility  Questionnaire.  All  SMEs  were  asked  to 
indicate  the  importance  of  each  task  for  performance  at  the  sergeant  (E5) 
level. 


Analysis  of  the  Supervisory  Responsibility  Questionnaire  data 
confirmed  that  all  the  tasks  were  sufficiently  important  to  be  retained.  The 
Leader  Reqiiireinents  Survey  importance  data  were  used  to  select  tasks  that  over 
half  of  the  respondents  indicated  were  absolutely  essential  to  the  sergeant's 
Job,  and  53  tasks  were  retained. 

Content  analysis  of  the  two  task  lists  resulted  in  a  single  list  of 
46  tasks  that  incorporated  all  of  the  activities  on  both  lists.  These  tasks, 
in  eight  clusters,  were  added  to  the  second-tour  job  task  list  for  each  of  the 
nine  jobs  prior  to  collection  of  task  characteristics  data.  Later  they  were 
made  part  of  the  task  clustering  judgments. 

The  selection  of  sample  tasks  for  measurement  was  based  on  the 
importance  rating  for  each  task,  the  performance  difficulty  and  expected 
performance  variability  for  each  task,  the  frequency  of  task  performance  as 
shown  by  the  AOSP  analyses,  and  the  task  cluster  membership  for  each  task.  A 
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Delphi  panel  of  SMEs  selected  45  tasks  for  each  job  —  30  technical  and  15 
supervisory. 

The  individual  panel  members  first  independently  sele'^ced  tasks, 
using  the  given  targets  for  each  cluster.  The  choices  were  tallied  and 
presented  to  the  panel  in  the  second  session.  They  again  made  independent 
selections,  this  time  giving  reasons  for  each  of  their  choices.  The  choices 
were  tallied,  the  reasons  summarized,  and  the  results  fed  back  for  considera¬ 
tion  throuoh  three  rounds  of  independent  selections.  In  the  fourth  session, 
the  remaining  differences  were  discussed  and  resolved.  Panel  members  also 
assigned  a  complete  priority  ranking  (1-45)  for  inclusion  in  the  final  set. 

Critical  Incident-Based  Job  Analysis 

To  incorporate  the  Army-wide  versus  MOS-specific  distinction,  an 
inductive  critical  incident  analysis  strategy  which  requires  persons  familiar 
with  the  Jobs  to  generate  examples  of  effective,  mid-range,  and  ineffective 
performance  behavior  was  again  used,  as  in  the  first-tour  Job  analyses 
(Pulakos  &  Borman,  1986;  Toquam,  et  al.,  1986).  Content  analysis  of  the 
examples  then  yields  preliminary  dimensions  of  performance,  and  an  independent 
retranslation  of  the  examples  into  the  dimensions  provides  a  way  of  checking 
on  the  content  validity  of  the  dimension  system. 

Army-Wide  Analysis.  Three  workshops  were  conducted  in  which 
participants  were  asked  to  generate  non-MOS-specif ic  examples  of  what  they 
considered  to  be  specific  second-tour  performance  episodes.  A  total  of  1,000 
critical  incidents  were  generated  by  172  officers  and  NCOs.  Table  8-1  shows 
characteristics  of  the  participants  in  the  workshops.  These  incidents  were 
edited  to  a  common  format  and  then  content  analyzed  to  form  12  preliminary 
dimensions  of  second-tour  Army-wide  performance.  The  nine  performance 
categories  that  had  been  developed  for  the  first-tour  soldiers  were  also  found 
in  the  second-tour  analysis;  in  addition,  three  generic  supervisory  dimensions 
emerged,  which  suoqested  that  second-tour  soldiers  do,  in  fact,  perform  most 
of  the  work  that  first-tour  soldiers  perform  Mjl  also  supervise  that  work. 

The  retranslation  results  indicated  that  all  iT^of  the  dimensions  resulting 
from  the  initial  categorization  of  the  incidents  should  be  retained.  The 
second-tour  array  of  12  Army-wide  performance  dimensions  is  shown  in  Table 
8-2. 

MQS-Specific  Analysis.  Development  of  the  second-tour  MOS-specific 
dimensions  followed  a  different  procedure  and  involved  a  process  for  revising 
the  existing  first-tour  MOS-specific  rating  scales  so  that  they  would  be 
appropriate  for  describing  and  evaluating  second-tour  performance. 


192 


Tabit  8-1 


Participants  In  Sacond-Tour  Workshops  for  Ganaration  rf  Anay>Wida 
Critical  Incidents 


Fort  Bragg 

102 

Fort  Carson 

53 

Other 

3 

NCOS 

-0- 

O^ficers^ 

-D- 

Rank; 

E-S 

19 

01 

8 

E-6 

13 

02 

26 

E-7 

2 

03 

82 

E-8 

1 

04 

18 

05 

2 

JL. 

Gender*: 

Male 

154 

Female 

17 

Mean 

Tjme  jn_Acmv.; 

6.92  years 

4.51  years 

Time  in  Supervisory  Position; 

5.09  years 

4.25  years 

'Fourteen  participants  left  this  blank. 
•One  participant  left  this  blank. 

*0ne  participant  left  this  blank. 
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Tabi*  8-2 

Arny-Wlda  Olnaaslons  for  Stcond  Tour 


A.  Displaying  Technical  Knowledge/Skill 

B.  Displaying  Effort,  Conscientiousness,  and  Responsibility 

*C.  Organizing,  Supervising,  Monitoring,  and  Correcting  Subordinates 
•0.  Training  and  Developing 

*E.  Showing  Consideration  and  Concern  for  Subordinates 

F.  Following  Regulations/Orders  and  Displaying  Proper  Respect  for 
Authority 

G.  Maintaining  Own  Equipnwnt 

H.  Displaying  Honesty  and  Integrity 

I.  Maintaining  Proper  Physical  Fitness 

0.  Developing  (V^n  Job/Soldiering  Skills 

K.  Maintaining  Proper  Military  Appearance 

L.  Controlling  Ovm  Behavior  Related  to  Personal  Finances, 
Drugs/Alcohcl,  and  Aggressive  Acts 

*New  leadership/supervisory  dimensions  W  second  tour. 


To  accomplish  the  revision,  a  critical  incident  analysis  workshop  was 
conducted  with  aporoximately  25  officers  and  NCOs  in  each  of  the  nine  target 
Jobs  (Batch  A  MOS)  to  Generate  examples  of  effective,  average,  and  ineffective 
second'tour  MOS-specific  Job  performance.  Table  8*3  shows  characteristics  of 
the  participants  in  the  workshops.  The  number  generated  for  each  MOS  ranged 
from  56  to  236  with  an  average  of  180  (Table  6-4).  The  incidents  were  then 
categorized  by  the  project  staff,  using  the  first-tour  MOS-specifk  category 
system  as  a  starting  framework.  If  a  second-tour  incident  did  not  fit  into  an 
existing  first-tour  category,  a  new  category  was  introduced.  This  procedure 
yielded  information  regarding  what  specific  category  additions  or  deletions 
were  necessary  to  describe  critical  second-tour  performance  comprehensively. 

Almost  all  of  the  first-tour  MOS-specific  performance  categories  were 
Judged  to  be  appropriate  for  second-tour  MOS.  The  next  step  was  to  examine 
the  content  of  the  incidents  to  determine  whether  the  performance  requirements 
were  appreciably  different  for  second-tour  than  for  first-tour  soldiers.  If 
comparisons  of  the  first-  and  second-tour  critical  incidents  indicated  that 
more  was  expected  of  second-tour  soldiers  than  of  their  first-tour  counter¬ 
parts  or  that  second-tour  soldiers  were  responsible  for  knowing  how  to  operate 
and  maintain  more/different  pieces  of  equipment,  such  distinctions  were 
incorporated  into  the  second-tour  scale  anchors. 
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T«b1t  8-3 


Participants  In  Sacond-Tour  Workshops  for  Ganaratlon  of  M0S-Sp«c1f1c 
Critical  Incidents,  by  MOS* 


111 

13B 

ISi 

Hi 

i2i 

ili' 

21k 

91A/B 

9SB 

Site;  Fort  Bragg 

11 

14 

1 

6 

7 

8 

11 

23 

Fort  Carson 

4 

m 

4 

3 

8 

4 

14 

1 

15 

Fort  Knox 

• 

m 

27 

- 

- 

- 

m 

m 

• 

Fort  Hood 

■ 

14 

- 

- 

20 

m 

- 

- 

Fort  Gordon 

. 

- 

• 

17 

- 

- 

- 

- 

- 

Fort  Sam  Houston 

•• 

- 

• 

• 

8 

• 

Total 

15 

14 

45 

21 

14 

31 

22 

20 

38 

Hi 

13E 

I2i 

Hi 

63B 

Mi 

Ilk 

91A/B 

95B 

Rank;  NCQs 

E-4 

• 

2 

- 

• 

• 

• 

• 

E-5 

m 

9 

3 

- 

■* 

1 

5 

- 

8 

E-6 

m 

9 

11 

12 

- 

19 

1 

4 

m 

E-7 

• 

5 

5 

1 

1 

5 

m 

3 

- 

E-8 

- 

- 

- 

- 

- 

e 

i 

m 

1 

- 

Officers 

01 

• 

m 

fli 

m 

1 

» 

3 

1 

3 

02 

. 

m 

7 

1 

• 

1 

4 

- 

12 

03 

14 

m 

11 

2 

12 

4 

8 

8 

12 

04 

1 

m 

• 

1 

• 

» 

1 

3 

3 

Gender;  Male 

15 

14 

45 

18 

13 

29 

17 

18 

34 

Female 

m 

« 

m 

2 

1 

2 

5 

2 

4 

Hi 

Hi 

Hi 

Hi 

63B 

64C 

21k 

mi 

iii 

Mean  Time  in  Army 

6.68 

11.97 

7.74 

11.28 

6.87 

12.06 

S.2G 

1  7.58 

C.40 

Mean  Time  in 

Supervisory  Position 

5.17 

8.09 

5.29 

7.31 

5.03 

7.49 

3.80 

5.01 

4.87 

*Many  of  these  participants  also  generated  Army-wide  critical  Incidents 


‘MOS  64C  subsequently  became  HOS  B8M. 
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I. 


Tab1«  8-4 


Stcond-Tour  NOS-SpacIfIc  Critical-Incldant  Workshops: 
Mumbors  of  Incldtnts  Gonoratod,  by  NOS* 


Number  of 

J!2S_ 

Incidents 

IIB 

15 

151 

13B 

14 

SB 

19E 

45 

236 

31C 

21 

212 

63B 

14 

180 

64C* 

31 

184 

71L 

22 

149 

91A 

20 

206 

95B 

38 

234 

•Many  of  these  participants  also  generated  Army-wide  critical 

Incidents. 

*M0S  64C  subsequently  became  MOS  86M. 


For  several  MOS,  the  second-tour  incidents  suggested  that  MOS-specific 
supervisory  performance  categories  should  be  developed.  However,  In  develop¬ 
ing  categories,  care  was  token  not  to  duplicate  the  Army-wide  leadership/ 
supervision  dimensions  and  to  reflect  aspects  of  supervision  that  were 
relevant  only  to  the  particular  job  In  question.  A  total  of  six  M0S-specif1c 
supervisory  dimensions  distributed  over  five  MOS  were  generated. 

For  each  of  the  nine  MOS,  two  scale  revision  workshops  were  conducted 
with  10-14  participants  (officers  and  NCOs)  In  each.  Participants  considered 
the  validity  of  the  dimension  anchors  for  evaluating  second-tour  effective¬ 
ness,  and  whether  the  proposed  dimensions  were  relevant  and  Inclusive  of  all 
M0$-spec1fic  performance  components.  Scales  were  revised  If  appropriate. 

For  each  MOS  a  third,  or  retranslation,  workshop  was  also  conducted  with 
approximately  20  officers  and  NCOs.  For  92  percent  of  the  revised  Incidents, 
more  than  75  percent  of  the.  sample  categorized  them  as  Intended.  The  dimen¬ 
sions  for  each  Batch  A  MOS  are  shown  In  Table  B-5. 
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T«b1i  8-5 

NOS-Specific  Oimnslons  for  Second  Tour 


IIB: 

Maintaining  and  Accounting  for 
Equipment  and  Weapons 
Supervising  Soldiers  in  the  Field 
Leading  the  Team 
Navigation 

Use  of  Organic  Weapons  and 
Equipment 

Field  Sanitation,  Personal  Hygiene, 
and  Personal  Safety 
Fighting  Positions 
Avoiding  Enemy  Detection 
Operating  Radio  Set 
Reconnaissance 
Guard  and  Security  Duties 
Prisoners  of  War 
Proficiency  in  Battle 

13B: 

Loading  Out  Equipment 
Driving  and  Maintaining  Vehicles, 
Howitzers,  and  Eouipment 
Transporting,  Sorting,  Stowing, 
and  Preparing  Ammunition 
Preparing  for  Occupation/Emplacing 
Howitzer 

Setting  up  Communications 
Gunnery 

Loading/Unloadino  Howitzer 
Receiving  and  Relaying  Communications 
Recording/Record  Keeping 
Position  Improvement 

19Ei 

Maintaining  Tank,  Tank  System,  and 
Associated  Equipment 
Driving  and  Recovering  Tanks 
Stowing  Ammunition  Aboard  Tanks 
Loading/Unloading  Weapons 
Maintaining  Weapons 
Engaging  Targets  with  Tank  Weapon 
Systems 

Operating  Communications  Equipment 
Preparing  Tanks  for  Field  Problems 
Assuming  Supervisory  Responsibilities 
in  Absence  of  Tank  Commander 


31C: 

Inspecting  and  Servicing 
Equipment 

Installing  Equipment 
Operating  Communication 
Devices 

Preparing  Reports 
Maintaining  Security 
Providing  Safe 
Transportation 
Preparing  for  Movement 
Managing  the  RATT  Rig 

636: 

Inspecting  and  Testing 
Equipment  Problems 
Checking  Repairs  Made 
by  Other  Mechanics 
Troubleshooting 
Performing  Preventive 
Maintenance  Checks  and 
Services 
Repair 

Using/Accounting  for  Tools 
and  Test  Equipment 
Using  Technical  References 
Equipment  Operation 
Safety  Mindedness  . 
Administrative  Duties 
Determine  Task  Requirements 
Recovery 

71L; 

Preparing,  Typing,  and 
Proofreading  Documents 
Processing  and  Distributing 
Documents 

Maintaining  Office  Resources 
Establishing  and/or  Main¬ 
taining  Files  lAW  MARKS 
Correspondence  Management 
Preparing  and  Safeguarding 
Classified  Materials 
Providing  Customer  Service 


(continued) 
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Table  8-5  (continued) 

NOS-SpecifIc  Dimensions  for  Second  Tour 


B8M: 

Driving  Vehicles 
Vehicle  Coupling 

Checking  and  Maintaining  Vehicles 
Using  Maps/Following  Proper  Routes 
Loading  and  Transporting  Cargo 
Loading  and  Transporting  Personnel 
Parking  and  Securing  Vehicles 
Performing  Administrative  Duties 
Recovering  Vehicles 
Safety-Mi ndedness 
Performing  Dispatcher  Duties 


91A: 

Maintaining  and  Operating  Army 
Medical  Vehicles  and  Equipment 
Maintaining  Accountability  of 
Medical  Supplies  and  Equipment 
Keeping  Medical  Records 
Arranging  for  Transportation  and/or 
Transporting  Injured  Personnel 
Dispensing  Medications 
Preparing  and  Maintaining  Field 
Site  or  Clinic  Facilities  in 
the  Field 

Providing  Routine  and  Ongoing 
Patient  Care 

Responding  to  Emergency  Situation 
Providing  Health  Care  and  Health 
Maintenance  Instruction  to 
Army  Personnel 


95B: 

Traffic  Control  and  Enforcement 
Providing  Security 
Investigating  Crimes  and  Making 
Apprehensions 
Patrolling 

Leading  the  Team  in  Tactical 
Environment 

Promoting  Public  Image  of 
Military  Police 
Interpersonal  Communications 
Skills 

Responding  to  Medical 
Emergencies 
Navigation 

Avoiding  Enemy  Detection 
Use  of  Weapons  and  Other 
Equipment 


M  An^Mj„lnUrY.jm 

The  final  Job  analysis  method  consisted  of  short  (one-hour)  structured 
Interviews  that  were  conducted  with  small  groups  (5-8  people)  of  NCOS  In  each 
of  the  nine  Jobs.  They  were  asked  about  the  number  or  percentage  of  sergeants 
who  would  probably  be  In  different  duty  positions,  and  about  the  normal 
activities  of  those  Individuals.  They  were  also  asked  to  Indicate  how  many 
hours  per  week  those  individuals  would  spend  on  each  of  nine  supervisory 
activities  and  each  of  two  general  areas  of  actual  task  performance,  and  hov; 
Important  each  of  those  11  aspects  of  the  Job  Is  for  the  second-tour  NCO. 
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This  Information  was  used  primarily  to  provide  Information  about  the  relative 
importance  and  time  spent  on  leadership/supervision  versus  technical 
activities. 


RESULTS 

Major  Differences  Between  First-  and.Second-Tour  MQS 

As  defined  by  the  task-based  descriptions,  the  additional  second-tour 
tasks  are  more  difficult  and  complex,  but  are  of  the  same  general  content  as 
the  first-tour  tasks.  The  addition  of  tasks  also  caused  several  of  the 
technical  clusters  to  split  into  more  highly  differentiated  task  subgroups. 

Another  important  difference  between  the  first-  and  second-tour  task 
domains  is  that  MOS-specific  leadership  clusters  were  added  or  expanded  in 
every  MOS.  In  seven  of  the  MOS  a  new  cluster  was  formed  to  represent  casks 
involving  either  tactical  operations  leadership  or  administrative  supervision, 
while  in  the  other  two  MOS  such  clusters  were  greatly  expanded  due  to  the 
addition  of  new  tasks. 

As  mentioned  previously,  analysis  of  the  Army-wide  critical  incidents 
led  to  the  addition  of  three  dimensions  reflecting  increased  supervisory/ 
leadership  responsibilities  across  all  Jobs.  These  three  dimensions  in  effect 
replaced  a  single  first-tour  leadership  dimension.  All  nine  of  the  other 
Army-wide  dimensions  that  had  been  developed  for  first-tour  soldiers  were 
replicated  for  the  second-tour  Job. 

Analysis  of  the  MOS-specific  critical  incidents  suggested  the  retention 
of  all  but  two  of  the  first-tour  dimensions;  in  three  cases,  a  single  first- 
tour  dimension  was  split  into  two.  Of  the  85  first-tour  dimensions,  38  (45%) 
were  unchanged.  The  added  technical  and  supervisory  responsibilities 
for  second  tour  resulted  in  substantial  changes  to  44  (52%)  of  the  dimensions, 
and  additional  MOS-specific  supervisory  dimensions  were  developed  for  five  of 
the  nine  MOS.  The  five  MOS-specific  supervision/leadership  scales  are 
summarized  in  Table  8-6. 

Thus,  although  the  MOS  vary  in  the  extent  to  which  supervisor/leadership 
responsibilities  constitute  new  dimensions  of  Job  content,  the  second-tour 
soldiers  in  all  MOS  are  responsible  for  the  performance  of  their  subordinates. 
The  technical  content  of  the  Jobs  is,  for  the  most  part,  similar  to  the 
content  of  first-tour  Jobs,  although  higher  proficiency  is  often  expected,  and 
more  difficult  tasks  are  frequently  added. 
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Tabit  8<6 

Suptrvisory  Parformance  Catagorles  for  Second-Tour 
HGS-Specif 1c  Scales 


_ m _ 

IIB  Infantryman 

13B  Cannon  Crewman 

19E  Armor  Crewman 

31C  Single  Channel  Radio  Operator 

63B  Light  Wheel  Vehicle  Mechanic 

71L  Administrative  Specialist 

88M  Motor  Transport  Operator 

91A/B  Medical  Specialist/Medical  NCO 

95B  Military  Police 


_PerfQrj)ance  Category  Name 

Supervising  Soldiers  in  the 
Field 

Leading  the  Team 
None 

Assuming  Supervisory 
Responsibilities  in 
Absence  of  Tank 
Commander 

Managing  the  RATT  Rig 

Checking  Repairs  Made  by 
Other  Mechanics 

None 

None 

None 

Leading  the  Team  in  a 
Tactical  Environment 


Specific  Nature  of  the  Leadership/Supervision  Component 

As  a  category  of  job  content,  leadership  and  supervision  represent  a 
sizable  proportion  of  the  junior  NCO  position.  For  example,  as  judged  by  the 
previously  described  job  analysis  interview  panels,  from  35  to  80  percent  of 
the  NCO's  time  is  spent  on  supervisory  activities. 

Given  the  substantial  nature  of  the  supervision/ leadership  components, 
the  next  step  was  to  attempt  a  more  detailed  description  of  their  content  in 
terms  of  specific  dimensions.  An  item  pool  was  created  by  first  using  project 
staff  judgments  to  identify  the  tasks  in  each  MOS  task  domain  that  represented 
leadership  or  supervision  content.  This  total  list,  summed  over  the  nine 
Batch  A  MOS,  was  edited  for  obvious  redundance  and  then  combined  with  the  46 
items  from  the  Supervisory  Responsibilities  Questionnaire.  This  produced  a 
total  pool  of  341  items  (tasks). 
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The  pool  of  341  individual  task  items  was  then  content  clustered  by  each 
of  12  judges  selected  from  the  Project  A  staff.  Given  the  target  that  the 
number  of  content  clusters  should  be  between  5  and  15,  if  possible,  each  judge 
sorted  the  task  items  into  categories  and  wrote  a  brief  definition  for  each 
category  (i.e.,  dimension).  Consequently,  there  were  12  cluster  solutions 
based  on  individual  expert  judgment. 

Next,  the  decree  of  agreement  among  all  12  judges,  in  terms  of  how  every 
pair  of  items  should  be  clustered,  was  used  as  input  to  an  empirical  cluster 
analysis.  The  results  of  the  cluster  analysis  were  compared  to  the  expert 
judgment  solutions  and  a  synthesized  description  of  specific  content 
dimensions  was  written  by  the  project  staff.  To  say  it  another  way,  a  pooled 
solution  was  obtained  by  expert  judgment,  the  results  of  this  pooled  solution 
are  shown  in  Figure  8*1. 


1.  Planning  Operations 

Activities  that  are  performed  in  advance  of  major  operations  of  a  tactical 
or  technical  nature.  That  is,  planning  for,  getting  ready  for,  and 
developing  orders  for  various  kinds  of  team  operations,  whether  it  be 
combat,  support,  or  technical  operations.  It  is  the  activity  that  comes 
before  actual  execution  out  in  the  field  or  work  place. 

2.  Oirecting/Leading  Teams 

The  tasks  in  this  category  are  concentrated  in  the  combat  and  military 
police  MOS.  They  involve  the  actual  direction  and  execution  of  combat  and 
security  team  activities.  They  occur  out  in  the  field  and  are  heavily 
dependent  on  MOS-specific  skills.  Leading  reconnaissance  teams,  setting  up 
offensive  ard  defensive  positions,  carrying  out  a  fire  mission,  directing 
the  clearing  of  mine  fields,  etc.  would  all  be  part  of  this  category.  They 
require  ‘real-time“  decisionmaking  under  pressure. 

3.  Monitoring/Inspecting 

This  cluster  includes  Interactions  with  subordinates  that  seem  to  involve 
keeping  an  operation  going  once  it  has  been  initUted,  such  as  checking  to 
make  sure  that  everyone  is  carrying  out  their  duties  properly,  assisting 
people  to  overcome  problems,  making  sure  everyone  .las  the  right  equipment; 
monitoring  or  evaluating  the  status  of  equipment  readiness,  supply  levels, 
completeness  of  written  reports,  adequacy  of  current  operating  procedures, 
etc.  This  is  a  non-combat  or  non-crisis  set  of  activities. 


Figure  8-1.  Suparvision/Leadership  Task  Categories  Obtained  by 

Synthesizing  Expert  Solutions  and  Empirical  Cluster  Analysis 
Solution  (Page  1  of  2) 
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4.  Individual  leadership 

The  content  of  the  tasks  In  this  cluster  reflects  attempts  to  Influence  the 
motivation  and  goal  direction  of  subordinates  by  means  of  goal  setting, 
Interpersonal  communication,  sharing  hardships,  building  trust,  etc. 

5.  Acting  as  a  Model 

This  dimension  Is  not  tied  to  a  specific  task  content  but  refers  to  the  NCO 
modeling  the  correct  performance  behavior  whether  It  be  technical  task 
performance  under  adverse  conditions,  or  exhibiting  appropriate  military 
bearing.  The  NCO  sets  the  example. 

6.  Counseling 

A  one-on-one  Interaction  with  a  subordinate  during  which  the  NCO  provides 
support,  guidance,  assistance,  and  feedback  on  specific  performance  or 
personal  problems  that  the  soldiers  might  be  experiencing.  It  Includes 
counseling  on  problems  of  a  disciplinary  nature. 

7.  Comnunlcatlon  with  Subordinates,  Peers,  and  Supervisors 

The  tasks  In  this  category  deal  with  composing  specific  types  of  orders, 
briefing  subordinates  on  things  th&t  are  happening,  and  communicating 
Information  up  the  line  to  superiors,  as  well  as  to  peers.  Information  is 
disseminated  In  both  written  and  oral  formats. 

8.  Training  Subordinates 

A  very  distinct  cluster  ot  tasks  that  describe  the  day-to-day  role  of  the 
NCO  as  a  trainer  for  Individual  subordinates.  When  such  tasks  are  being 
executed,  they  are  clearly  Identified  as  Instructional  (as  distinct  from 
evaluations  or  disciplinary  actions).  Involves  scheduling,  planning,  and 
conducting  training. 

9.  Personnel  Administration 

This  category  Is  made  up  of  "paperwo.'k*  or  administrative  tasks  that 
Involve  actually  doing  performance  appraisals,  making  or  recommending 
various  personnel  actions,  keeping  and  maintaining  adequate  records,  and 
following  standard  operating  procedures  for  Army  personnel  practices. 


Figure  8-1.  Supervislon/Leidership  Task  Catego.  its  Obtained  by 

Synthesixing  Ixpert  Solutions  and  Empirical  Cluster  Analysis 
Solution  (Page  2  of  2) 
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Chapter  9 

DEVELOPKENT  OF  SECOND-TOUR  JOB  PERFORMANCE  MEASURES 


As  described  previously,  there  mas  considerable  Job  analysis  Information 
on  which  to  base  second-tour  performance  measurement.  For  each  MOS,  30 
technical  (M0S-spec1f1c  and  common)  tasks  and  15  supervisory  tasks  were 
selected  to  represent  the  task  clusters  and  all  45  selected  tasks  were  rank 
ordered  In  terms  of  their  overall  Importance  to  the  HOS.  The  critical 
Incident  analysis  yielded  a  portrayal  of  each  MOS  in  terms  of  its  general  and 
specific  critical  performance  components  In  both  technical  performance  and 
leadership,  and  the  series  of  Job  analysis  interviews  yielded  a  roiigh  estimate 
of  the  reUtive  importance  and  time  spent  for  technical  vs.  supervisory 
activities  for  each  MOS.  Cluster  analyses  were  used  to  further  e':plore  the 
specific  dimensions  of  supervisory/leadership  performance. 

Given  available  resources,  constraints  on  testing  time,  guidance  from 
the  literature,  previous  Project  A  work,  and  the  second-tour  Job  analysis 
results,  a  potential  set  of  measurement  methods  was  Identified  and  reviewed  by 
the  project  staff  and  the  Scientific  Advisory  Committee.  Some  of  the  measure¬ 
ment  methods  had  been  used  for  the  first  tour  and  some  were  newly  developed. 

As  indicated  by  the  second-tour  Job  analyses,  there  is  considerable 
overlap  in  Job  content  between  first  tour  and  second  tour,  except  that  the 
core  technical  tasks  become  more  complex  and  significant  components  of 
leadership  and  supervision  are  added.  Consequently,  a  number  of  first-tcur 
measurement  methods  were  modified  for  second-tour  use,  and  several  new 
measures  of  supervision  and  leadership  were  added. 

To  accommodate  the  new  supervisory  measures,  assessment  of  technical  task 
knowledge  end  performance  (i.«.,  hands-on  and  Job  knowledge  tests)  was 
allotted  less  time  than  in  first-tour  performance  assessment.  Reducing 
assessment  time  was  Judged  to  be  better  than  eliminating  either  measurement 
strategy  because  (a)  hiohly  reliable  Job  knowledge  tests  can  be  written  for 
almost  any  task,  and  'b)  the  hands-on  tests  were  designed  to  have  a  high 
degree  of  content  validity.  For  the  Job  knowledge  tests,  tes  time  was 
reduced  by  using  fewer  items  for  each  task.  This  strategy  li  '-y'  feasible 
with  hands-on  tests  because  the  scorable  steps  within  task  tts  ..  ;re  too 
Interdependent  to  be  selectively  eliminated.  Consequently,  tasks  were 

tested  In  a  hands-on  mode  relative  to  the  number  of  tasks  so  'sii'ced  for  first- 
tour  soldiers. 

Three  data  collections  were  associated  with  the  development  ■'.-v  :.h«  second- 
tour  criterion  measures.  These  are  outlined  In  Table  9-1.  Ti'^  table  lists 
the  types  of  Individuals  involved  (i.e.,  SMEs  or  Job  Incumbent  i,  testing/ 
workshop  locations,  and  the  purpose  of  each  data  collection. 
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Tabli  9-1 

Data  ColUetlon  Efforts  in  Sacond-Tour  Critorion  Oavalopnant 


Pilot  Tests 

Locations  Proponent  Schools 

Participants:  4  E6  SMEs  and  5  ES  incumbents  (per  MOS) 

Purpose:  First  tryouts  of  hands-on  tests 

Initial  generation  of  role-play  exercises 
Preliminary  Situational  Judgment  Test  (SJT)  workshops 


Location:  U5AREUR,  Fort  Bragg,  Fort  Hood 

Participants:  Primarily  second- tour  incumbents;  41  to  61  soldiers  per  MOS 

Purpose:  Field  testing  of  hands-on  tests 

First  administration  of  job  knowledge  tests 
Administration  of  exparimental  version  of  SJT 
Administration  of  experimental  versions  of  counseling 
role-plays 

Development  of  training  role-play 

Administration  of  draft  versions  of  the  second-tour  oersonnel 
File  Foi'm,  the  second-touy  performance  rating  scales, 
the  Army  Job  Satisfaction  Questionneire  and  marker 
instrument  (MSQ),  and  two  versions  of  the  Combat 
Performance  Prediction  rating  scale 


Location;  Fort  Campbell,  Fort  Devens.  Fort  Sam  Houston,  U3  Army 
Sergeants  Major  Academy  (USASMA) 


Participants:  Senior  NCOS  (n»b6);  students  end  Instructors  from  USASMA 
(n-91) 

Purpose:  Generate  situations  and  response  alternatives 

Gather  effectiveness  data  on  response  alternatives 
Review  SJT  items  for  realism  and  appropriateness 


SECOND-TOUR  PERFORMANCE  CRITERIA  OBTAINED  BY 
MODIFYING  FIRST-TOUR  MEASURES 

M#asurts  of  Technical  Task  PerforiMBCP 

Because  by  doctrine*  Skill  Level  2  soldiers  (pay  grade  E-5)  are  also 
responsible  for  Skill  Level  1  (covers  pay  grades  E*1  through  E-4)  tasks,  the 
tecnuicel  tasks  selected  for  testing  first-  and  second-tour  soldiers  over¬ 
lapped  to  a  substantial  degree.  Development  of  new  Job  knowledge  and  hands-on 
tests  for  the  non-overlapping  tasks  was  modeled  after  thf  procedures  used  for 
the  first-tour  tests.  The  hands-on  tests  were  submitted  to  pilot  testing  and 
a  field  test  before  being  finalized  for  administration  to  the  second-tour 
sample.  The  first  administration  of  the  job  knowledge  tests  took  place  during 
the  field  test  data  collection. 

With  respect  to  the  Job  knowledge  tests,  item  analyses  on  the  field  test 
data  were  used  to  identify  items  which  required  revision  and  to  reduce  the 
number  of  items  so  that  the  tests  could  be  administered  in  one  hour.  Similar¬ 
ly,  field  test  results  were  used  to  identify  needed  revisions  to  the  instruc¬ 
tions  and  scorable  steps  of  the  hands-on  tests.  Also,  the  field  test 
administration  provided  the  information  for  determining  which  hands-on  tests 
were  to  be  administered  and  which  were  to  be  dropped. 

Note  that  the  Multipurpose  Arcade  Combat  Simulator  that  was  added  to 
the  criterion  measure  set  for  first-tour  MOS  IIB  and  95B  soldiers  was  also 
administered  to  second-tour  soldiers  in  these  MOS. 

Rfl^lna  SSilttE 

As  described  in  the  section  on  the  second-tour  Job  analysis,  the 
second-tour  Army-wide  and  MOS-specific  performance  rating  scales  were  devel¬ 
oped  using  the  first-tour  scales  as  a  starting  point.  Information  generated 
through  the  second-tour  Job  analysis  was  used  to  revise  these  instruments  to 
make  them  suitable  for  second-tour  soldiers.  For  example,  the  Army-wide  “NCO 
potential*  scale  was  replaced  with  a  "senior  NCO  potential"  scale. 

Furthermore,  a  set  of  scales  was  added  to  tap  supervisory  performance 
dimensions  that  were  identified  in  the  second-tour  Job  analysis  (see  Figure 
8-1).  A  list  of  the  areas  covered  in  the  rating  scales  and  an  example  of  one 
of  these  scales  is  provided  in  Figure  9-1. 

The  Army-wide,  MOS-specific,  and  supervisory  performance  rating  scales 
were  administered  during  the  second-tour  field  test.  No  changes  to  the  scales 
were  made  as  a  result  of  analysis  of  those  data. 

A  panel  of  SMEt  Indicated  that  the  Combat  Performance  Prediction  rating 
scales  as  revised  for  first-tour  soldiers  would  also  be  applicable  for  second- 
tour  soldiers.  All  of  the  rating  scales  intended  for  use  with  second  tour 
soldiers  were  administered  during  the  field  tests  listed  in  Table  9-1. 


*Army  Regulation  611-201,  Enlisted  Career  Management  Fields  and  Military 
Occupational  Specialties. 
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_ Scftit  Smi 

0  ACTING  AS  A  ROLE  MODEL 
0  COMMUNICATION 
0  PERSONAL  COUNSELING 
0  MONITORING  SUBORDINATE  PERFORMANCE 
0  ORGANIZING  MISSIONS/OPERATIONS 
0  PERSONNEL  ADMINISTRATION 
0  PERFORMANCE  COUNSELING/CORRECTING 


ACTING  AS  A  ROLE  MODEL  FOR  SUBORDINATES 


Motivates  subordinates  to  perform  effectively  through  personal  example, 
Including  demonstrating  high  standards  of  military  appearance,  bearing,  and 
courtesy;  Is  a  model  supervisor  for  subordinates  to  look  up  to  by 
demonstrating  exemplary  behavior  as  a  soldier. 


Falls  below  standards 
and  expectations  for  per* 
formance  In  the  category 
"Acting  as  a  Model" 
compared  to  soldiers  at 
same  experience  level. 


(1) 


(2) 


Meets  standards  and  expectations 
for  performance  In  the  category 
"Acting  as  a  Model*  compared 
to  soldiers  at  same 
experience  level. 


(3) 


(4) 


(5) 


Exceeds  standards  and 
expectations  for  perfor* 
mance  In  the  category 
"Acting  as  a  Model" 
compared  to  soldiers  at 
same  experience  level. 

(6)  (7) 


Figure  9-1.  Example  of  Supmrvisory/Leadership  Perfomanca  Ratings. 
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Personnel  File  Forw  II 

A  Personnel  File  Form  suitable  for  second-tour  soldiers  was  developed 
by  reviewing  the  contents  of  the  Personnel  File  Form  for  first-tour  soldiers 
with  officers  and  NCOs  who  were  representatives  of  the  Army's  Military 
Personnel  Center.  In  addition  to  the  information  gathered  on  the  first-tour 
version  of  this  instrument,  the  second-tour  form  elicits  information  related 
to  the  soldier's  promotion  and  reenlistment  background.  Three  categories  were 
added  to  the  form  in  an  effort  to  reflect  the  additional  administrative 
actions  appropriate  for  soldiers  in  their  second  tour.  These  categories  were 
Education,  Promotion  Boards,  and  Reenlistment  waivers.  Army  Regulations  were 
reviewed  to  identify  information  available  on  the  Promotion  Board  Worksheet, 
and  officers  and  NCOs  who  served  on  promotion  boards  were  interviewed  to 
answer  questions  about  the  NCO  promotion  process  to  E-5  and  above.  A  draft 
version  of  the  second-tour  Personnel  File  Form  was  administered  during  the 
second-tour  field  test.  Only  minor  changes  were  made  to  the  form  as  a  result 
of  field  test  data  analyses. 


NEW  CRITERION  MEASURES  FOR  THE  ASSESSMENT  OF 
SECONO-TOUR  (NCO)  PERFORMANCE 

Based  on  a  review  of  the  literature  and  a  careful  consideration  of  the 
feasibility  of  additional  measurement  methods,  two  new  methods  were  developed 
for  assessing  second-tour  NCO  Job  performance:  role-play  exercises  and  a 
situational  judgment  test.  The  role-play  exercises  were  intended  to  assess 
the  one-on-one  interpersonal  skills  required  for  counseling  and  training 
subordinates,  whereas  the  Situational  Judgment  Test  (SJT)  was  intended  to 
cover  as  broad  a  range  of  important  supervisory  skills  as  possible  within  the 
constraints  of  a  paper-  and-pencil  format. 

Role-Plav  Exercises 

Three  role-play  simulations  were  developed: 

e  Counseling  of  a  subordinate  with  personal  problems. 

a  Counseling  of  a  subordinate  with  performance  problems. 

e  Remedial  training  with  a  subordinate. 

These  particular  simulations  were  developed  because  they  cover  three  of  the 
most  critical  tasks  in  the  supervisory  component  in  the  NCO  Job,  as  identified 
in  the  Job  analysis. 

The  general  format  for  the  simulations  is  for  the  examinee  to  play  the 
role  of  a  supervisor.  The  examinee  is  prepared  for  the  role  with  a  one-page 
description  of  the  situation  that  he  or  she  will  be  asked  to  handle.  The 
subordinate  is  played  by  a  confederate  who  is  trained  to  act  out  a  detailed 
role.  This  confederate  also  has  responsibility  for  scoring  the  performance  of 
the  supervisor  (i.e.,  examinee). 

The  information  and  data  for  the  development  of  the  role-plays  came  from 
several  sources,  including  (a)  Army  NCO  training  materials,  (b)  the  second- 
tour  pilot  tests,  and  (c)  the  second-tour  field  tests.  The  initial  content 
of  the  counseling  exercises  was  generated  during  the  first  two  second-tour 
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pilot  tests.  Several  promising  scenarios  were  selected  for  further  develop¬ 
ment. 


The  Initial  developmental  steps  Involved  the  drafting  of  four  documents; 
(a)  a  description  of  the  supervisor's  role,  (b)  a  short  description  of  the 
subordinate's  role,  (c)  a  set  of  detailed  Instructions  for  playing  the  part  of 
the  subordinate,  and  (d)  a  performance  rating  Instrument.  Project  staff 
drafted  a  checklist  of  behaviors  applicable  to  performance  In  a  counseling 
situation,  to  bo  used  as  a  rating  device.  This  checklist  was  generated  using 
NCO  Instructional  materials  provided  by  the  Army.  Participants  In  subsequent 
pilot  tests  tried  out  the  role-plays  and  provided  Input  for  refining  them. 

This  was  an  Iterative  process  with  participants  In  the  later  pilot  tests 
trying  out  role-play  materials  that  had  already  gone  through  several 
revIsUns.  These  tryouts  Involved  considerable  shadow-scoring  as  a  means  of 
evaluating  the  reliability  of  the  rating  checklist. 

During  the  course  of  the  pilot  tests,  development  efforts  became  focused 
on  two  counseling  exercises,  one  In  which  the  subordinate  had  a  personal 
problem  and  the  other  In  which  the  subordinate  exhibited  a  performance-related 
disciplinary  problem.  Also  during  this  time  the  performance  checklist  evolved 
Into  a  rating  scale  format.  Anchors  for  three  possible  ratings  were  developed 
for  each  performance  behavior.  The  final  set  of  behaviors  to  be  rated 
underwent  considerable  refinement. 


The  first  formal  tryout  for  the  counseling  exercises  was  during  the 
second-tour  field  tests.  In  this  setting,  the  subordinate  roles  were  pl«yed 
by  NCOS  who  were  also  responsible  for  hands-on  scoring.  Each  NCO  was  trained 
on  one  of  the  two  counseling  exercises.  A  maximum  of  one-half  day  was 
available  for  training.  During  this  training,  the  NCOS  learned  how  to  play 
the  roles  and  how  to  use  the  rating  scales.  During  the  course  of  training, 
NCOS  took  turns  playing  the  subordinate  and  supervisor  roles.  In  order  to 
evaluate  Interrater  reliability,  at  least  two  raters  evaluated  each  soldier  s 
performance  In  the  simulation  exercises.  No  changes  to  the  role-plays  were 
considered  necessary  as  a  result  of  analysis  of  the  field  test  data. 


The  development  of  the  training  role-play  was  somewhat  different.  The 
content  of  the  training  tasks  was  determined  by  having  pilot  test  partici¬ 
pants  examine  the  first-tour  technical  task  domains  for  their  MOS  and  nominate 
tasks  that  met  the  following  criteria: 


(1)  Is  relatively  complex. 

(2)  Should  allow  the  trainer  to  exhibit  his  or  her  training  skill. 

(3)  Must  have  standardized  equipment  and  procedures  across  locations. 

(4)  Has  minimal  performance  differences  within  or  across  MOS. 

(5)  Can  be  trained  In  IS  to  20  minutes. 

(6)  Should  not  be  a  task  that  Is  tested  hands-on  for  second-tour 
soldiers. 
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The  rev1e>N'  !',  dented  that  no  MOS-ipe';if technical  task  met  all  six  criteria. 
The  tasks  for  Mhl-h  minimal  MOS  differei.eos  Mere  expected  were  too  simplistic. 
For  other  tasks,  large  differences  In  Cask  familiarity  were  expected  both 
within  and  across  MOS. 

Consideration  then  turned  to  common  soldiering  tasks  (I.e.,  first  aid, 
weapons)  that  night  require  remedial  training.  The  most  likely  candidates 
were  associated  with  drill  and  ceremony  activities.  This  was  a  promising  area 
because  all  soldiers  learn  drill  and  ceremonies  In  basic  training,  most  units 
perform  this  function  dally,  the  procedures  are  the  same  across  posts,  and 
NCOS  expressed  confidence  that  this  would  be  an  appropriate  source  of  training 
simulation  "tasks*.  Two  drill  and  ceremony  behaviors  were  selected:  the 
about  face  and  the  hand  salute.  As  with  the  counseling  role-plays,  materials 
were  prepared  to  specify  the  subordinate  and  supervisor  roles  1n  the  training 
exercise  and  to  draft  a  rating  fonq.  Again,  the  behaviors  to  be  rated  were 
derived  from  trainer  manuals  used  by  the  Army.  The  Iterative  process  of 
trying  out  the  role-play  and  revising  took  place  during  the  field  test  data 
collections. 


Figures  9-2,  9-3,  and  9-4  show  the  three  role-play  scenarios,  an  excerpt 
from  one  of  the  three  rating  forms,  and  an  outline  of  the  training  that  would 
be  provided  to  all  subordinate  scorers.  The  plan  for  administering  the  role- 
plays  to  the  second-tour  personnel  in  the  CVII  sample  Involved  the  use  of 
civilians,  hired  and  trained  specifically  for  this  data  collection,  as  the 
role-play  confederates.  It  was  decided  that  the  most  suitable  role-player 
candidates  would  be  young  men  with  prior  military  experience.  Once  hired, 
role-players  were  to  be  given  at  least  3  days  of  training  In  a  centralized 
location. 

Prior  to  administration  to  the  validation  sample,  the  role-play  exercise 
materials  were  submitted  to  the  U.S.  Army  Sergeants  Major  Academy  for  a 
proponent  review.  The  reviewers  found  the  exercises  to  be  an  appropriate  and 
fair  assessment  of  supervisory  skills,  and  did  not  request  any  revisions.  At 
this  point,  the  role-play  simulations  were  deemed  ready  for  administration  to 
the  evil  sample. 


Situational  Judgment  Test  (SJT) 

The  purpose  of  the  SOT  Is  to  evaluate  the  effectiveness  of  Judgments 
about  what  one  should  do  In  typical  supervisory  problem  situations.  A 
critical  Incident  methodology  was  used  to  generate  situations  for  Inclusion  In 
the  SJT,  and  the  SMEs  who  generated  situations  and  response  options  were  pilot 
test  participants.  SMEs  were  provided  with  the  taxonomy  of  supervisory/ 
leadership  behaviors  generated  by  the  second- tour  Job  descriptions  and  were 
given  the  following  criteria  for  “good"  situations: 

(1)  It  Is  challenging.  Situation  should  be  difficult  enough  so 
that  not  everyone  would  be  likely  to  know  the  best  response. 

(2)  It  Is  realistic. 

(3)  There  Is  a  best  response,  cr  at  least  some  responses  are 
better  than  others. 
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PERSCHAL  COUNSELXMG  ROLE-PLAY  SCENARIO 


Suptrvikory  Problw: 

PFC  Brown  Is  txhIbUIng  declining  job  performance  and  personal  appearance. 
Recently,  Brown's  wall  locker  was  left  unsecured.  You  have  decided  to  counsel 
th's  soldier. 

Subordinate  Role: 

e  Soldier  Is  having  difficulty  adjusting  to  life  In  Korea  and  Is 
experiencing  financial  problems. 

e  Reaction  to  counseling  Is  initially  defensive,  but  will  calm 
down  If  not  threatened.  Mill  not  discuss  personal  problems 
unless  prodded. 

COUNSELING  ROLE-PLAY  SCENARIO 

Supervisory  Problem: 

There  Is  convincing  evidence  that  PFC  Smith  lied  to  get  out  of  coming  to  work 
today.  This  soldier  has  arrived  late  to  work  on  several  occasions  and  has  been 
counseled  for  lying  In  the  past.  You  have  Instructed  Smith  to  come  to  your 
office  Immediately. 

Subordinate  Role: 

e  Soldier's  work  Is  generally  up  to  standards,  which  seems  to 

Justify  occasional  "slacking  off."  Slept  In  to  nurse  a  hangover 
and  lied  to  cover  up. 

e  Initial  reaction  to  counseling  is  a  very  polite  denial  of  lying. 

e  If  supervisor  Insists,  soldier  admits  guilt,  then  whines  for 
leniency. 


TRAINING  ROLE-PLAY  SCENARIO 


Supervisory  Problem: 

The  commander  will  be  observing  the  unit  practice  formation  In  30  minutes.  PVT 
Martin,  although  highly  motivated,  1$  experiencing  problems  with  the  hand  salute 
and  about-face. 

Subordinate  Role: 

e  Feelings  of  embarrassment  contribute  to  the  soldier's  clumsiness, 
e  Soldier  makes  very  specific  mistakes. 


Figure  9-2.  Supervisory  role-pley  scenarios. 
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ROLE-PLAY  EXERCISES 
EXAMPLE  OF  RATING  SCHEME 


1.  Develops  rapport  at  the  start  of  the  session. 

3  ■  Opens  the  Interview  In  a  pleasant,  nonthreatening 
manner. 

2  ■  Opens  the  Interview  In  a  generally  nonthreatening 
manner  but  uses  a  tone  of  voice  or  non-verbal 
actions  that  leave  the  subordinate  feeling  somewhat 
defensive. 

1  ■  Opens  the  Interview  In  a  hostile  or  threatening 
manner,  leaving  the  subordinate  feeling  very 
defensive  from  the  start. 


2.  States  the  purpose  of  the  counseling  session  clearly  and 
concisely. 

3  •  Outlines  all  topics  to  be  covered  (e.g.,  the  purpose 
Is  to  discuss  the  wall  locker  that  was  left  open 
last  night,  any  problems  the  subordinate  may  be  having 
and  what  might  be  done  to  resolve  them,  etc.). 

2  "  States  at  least  one  general  topic  to  be  discussed 
(e.g.,  says  the  purpose  Is  to  talk  about  the 
subordinate's  recent  poor  performance). 

I  ■  Falls  to  state  a  purpose  for  the  session;  Instead, 
Jumps  directly  Into  the  problems. 


Figure  9-3.  Example  of  role-play  exercise  rating  scheme. 
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A.  General  'briefing  and  orientation. 

B.  Distribute  supervisor's  role,  subordinate's  role  and  how  to  play  the 
subordinate's  role.  Explain  these  and  have  scorers  read  the  materials 
silently. 

C.  Sutrmarize  the  roles.  Provide  step-by-step  instructions  about  how  to  play 
the  subordinate's  role. 

0.  Distribute  the  rating  scales,  explain  the  rating  system,  and  have 
trainees  read  the  scales  silently. 

E.  Review  each  scale  separately,  detailing  differences  between  a  "3" 
versus  a  "2"  versus  a  "1", 

F.  Break  group  into  pairs  and  have  each  pair  practice  the  role-play  on  their 
own.  The  purpose  here  is  to  familiarize  trainees  with  the  exercise. 

G.  Bring  everyone  back  together.  Select  two  trainees,  one  to  play  the 
supervisor  and  the  other  to  play  the  subordinate.  The  other  trainees 
observe  and  score  the  role  play. 

K.  The  oroup  discusses  their  ratings  and  resolves  discrepancies.  Feedback  is 
provided  on  how  well  the  trainee  played  the  subordinate's  role. 

I.  Steps  G  and  H  are  repeated  until  each  trainee  has  had  an  opportunity  to 

play  the  subordinate's  role. 

J.  Break  the  group  into  triads.  Continue  practicing  playing  the  subor¬ 
dinate's  role,  evaluating  the  supervisor's  performance,  and  discussing  the 
ratings.  Trainer  circulates  among  the  groups. 


Figure  9-4.  Role-player  training. 
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(4)  It  provides  sufficient  detail  to  help  the  supervisor  make  a 
choice  between  possible  actions. 

(5)  A  response  to  the  situation  can  be  conmunicated  in  a  few 

sentences. 

(6)  It  relates  to  the  second-tour  supervisory  duties  in  any  MOS,  not 
Just  one  MOS  (l.e.,  it  is  an  Army-wide  situation).  Some  workshop 
participants  were  also  asked  to  write  HOS-specific  situations. 

Response  options  were  developed  through  a  combination  of  input  from 
pilot  test  SMEs  and  incumbents  at  the  sergeant  level  from  the  field  tests. 

SMEs  wrote  short  answers  (1-3  sentences)  to  the  situations  describing  what 
they  would  do  to  respond  effectively  to  each  situation.  Several  strategies 
were  used  to  elicit  response  options,  including  written  alternatives  generated 
by  individuals  and  alternatives  arising  out  of  small  group  discussions.  The 
written  short  answers  were  content  analyzed  by  research  staff  and  additional 
response  alternatives  generated.  Table  S-2  presents  the  workshops  completed 
and  the  work  accomplished  in  generating  the  initial  set  of  236  situations. 

During  the  last  four  workshops,  seven  to  nine  E-5  to  E-7  SMEs  from  each 
of  four  MOS  scaled  the  effectiveness  levels  of  34  responses  to  11  situations. 
The  rationale  for  generating  the  preliminary  effectiveness  scale  was  to  obtain 
initial  data  on  possible  across-MOS  differences  in  preferred  supervisory 
style.  The  grand  means  of  response  effectiveness  levels  differ  somewhat  by 
MOS  (Table  9-3),  and  the  corrections  between  mean  MOS  ratings  (Table  9-4) 
show  moderately  high  relationships  (B,s  ■  .57  to  .73;  N  ■  34). 

Additional  data  were  gathered  on  180  of  the  best  situations  during  the 
field  tests  (see  Table  9-1).  Field  test  incumbents  responded  to  experimental 
items  by  assessing  the  effectiveness  of  each  listed  response  option  on  a  scale 
of  1  to  7,  and  by  indicating  which  option  they  believed  was  most  and  which 
least  effective.  During  the  analysis  of  the  field  test  data,  the  content  of 
open-ended  responses  from  higher  rated  versus  lower  rated  soldiers  was 
compared  to  help  guide  the  generation  of  more  response  alternatives.  In 
addition,  comparisons  were  made  between  the  perceived  effectiveness  levels 
(i.e.,  effectiveness  ratings)  of  response  alternatives  from  higher  rated 
versus  those  from  lower  rated  soldiers.  Response  alternatives  were  revised 
and  some  situations  dropped  between  the  first  and  second  field  tests.  In 
addition,  the  effectiveness  level  comparisons  and  response  revisions  and 
situation  drops  were  repeated  for  the  second  and  third  field  tests. 

Two  additional  workshops  were  conducted  at  Fort  Oevens  and  Fort  Sam 
Houston,  with  seven  to  nine  NCOs  in  eiach.  At  these  workshops  effectiveness 
scale  values  were  gathered  from  “expert"  NCOs  for  each  response  alternative, 
the  SJT  was  reviseo  and  refined,  and  a  scoring  key  was  developed. 
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T«b1t  9-2 

Situation  Workshops  Complatad  and  WoHc  Acccmplished 


Workshop  Site 

MOS 

Situations 
Generated 
With  "Best“ 
Response 

Situations 

Reviewed 

By  Small 
Groups 

Situations 
for  Which 
Individual 

Short  Answers 

Were  Written* 

Fort  Campbell 

Mixed 

74 

0 

0 

Fort  Sam  Houston 

91A/B 

40 

0 

0 

Fort  Gordon 

31C 

64 

64 

0 

Fort  Sill 

13B 

71 

115 

0 

Fort  McClellan 

95B 

4S 

60 

0 

Fort  Ben 

Harrison 

71L 

34 

25 

40 

Aberdeen 

Proving  Grounds 

63B 

32 

24 

50 

Fort  Eustis 

88M 

25 

47 

40 

Fort  Banning 

IIB 

35 

134 

m 

Total 

236 

*Seven  to  nine  per 

sUuation 

Table  9-3 

Grand  Means  of  Situation  Response  Effectiveness  by  MOS 

MOS 

Items 

H 

People 

N 

Standard 

Mean  Deviation 

hL 

63B 

83H 

IIB 

Total  Sample 

U 

34 

34 

34 

34 

7 

8 

7 

9 

31 

4.53  1.22 
4.64  1.42 
4.76  1.46 
5.42  1.12 
4.89  1.13 
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Table  9-4 


Intercorrelations  of  Vectors  of  Item  Means  for  Each  MOS 
anb  for  the  Total  Sample  (H  ■  34) 


Total 


Sample 

71L 

63B 

88M 

Total  Sample 

1.00 

MOS  71L 

.83 

1.00 

MOS  63B 

.91 

.70 

1.00 

MOS  88M 

.83 

.57 

.71 

l.OO 

MOS  IIB 

.87 

.67 

.73 

.60 

A  final  set  of  35  test  Items  was  selected  on  the  basis  of  four  criteria: 
(a)  good  agreement  among  SMEs  on  “correct”  resporses,  less  agreement  among 
incumbents;  (b)  item  content  representation;  (c)  good  distractors;  and  (d) 
USASMA  proponent  feedback.  There  are  three  to  five  response  options  per  item. 
The  instructions  and  an  example  item  are  shown  in  Figure  9-5.  Examinees  are 
asked  to  indicate  the  most  and  least  effective  response  alternative  to  each 
situation.  The  Reading  Grade  Level  of  the  test,  as  assessed  using  the  FOG 
index,  is  seventh  grade.  Subsequent  to  Project  A,  various  scoring  schemes 
will  be  developed  using  the  effectiveness  ratings  for  response  alternatives 
obtained  in  the  field  tests  and  the  item  analyses  to  be  conducted  using  CVII 
data.  These  scoring  approaches  include  weighting  an  examinee’s  “most  effec¬ 
tive"  choice  for  a  situation  by  that  response  alternative's  effectiveness 
scale  values  (provided  by  SMEs). 

In  addition  to  providing  SMEs  to  generate  sca1-<ng  data,  USASMA  provided 
a  proponent  review  of  the  final  t-st.  As  with  the  role-play  exercises,  USASMA 
reviewers  considered  the  SOT  to  be  a  fair  and  appropriate  method  for  assessing 
supervisory  performance.  The  SJT  also  shares  with  the  role-plays  the  limita¬ 
tion  that  it  was  not  thoroughly  field  tested  prior  to  administration  to  the 
evil  sample.  Consequently,  the  CVII  data  collection  is  most  appropriately 
considered  a  field  test. 
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INSTRUCTIONS 


In  this  booklet,  you  will  be  presented  with  a  series  of  supervisory 
situations.  These  are  situations  In  which  a  first-line  supervisor  night  find 
hln/herself.  After  each  situation  several  possible  responses  to  that 
situation  are  listed. 

Read  each  situation  and  the  responses  listed.  Then  decide  which  of  these 
possible  responses  would  be  the  most  effective.  Place  an  '‘M"  In  the  box  next 
to  the  most  effective  response. 

Next  decide  which  of  these  possible  responses  Is  the  least  effective.  Place 
an  “L"  In  the  box  next  to  the  least  effective  response.  The  boxes  In  front  of 
the  remaining  response  alternatives  should  be  left  blank. 

Below  Is  an  example  of  an  Item  which  has  been  completed  properly. 

You  are  a  squad  leader.  Over  the  past  several  months  you  have 
noticed  that  one  of  the  other  squad  leaders  In  vour  platoon 
hasn't  been  conducting  his  CTT  training  correctly.  Although  this 
hasn't  seemed  to  affect  the  platoon  yet,  It  looks  like  the 
platoon's  marks  for  CTT  will  go  down  If  he  continues  to  conduct 
CTT  training  incorrectly.  What  should  you  do? 


Do  nothing  since  performance  hasn't  yet  been  affected. 


Have  a  squad  leader  meeting  and  tell  the  squad  leader  who 
has  been  conducting  training  improperly  that  you  have 
noticed  some  problems  with  the  way  he  1s  training  his  troops. 


Tell  your  plaioon  sergeant  about  the  problem. 


Privately  pull  the  squad  leader  aside.  Inform  him  of  the 
problem,  and  offer  to  work  with  him  if  he  doesn't  know  the 
proper  CTT  training  procedure. 

You  may  not  agree  with  the  placement  of  the  "M"  and  the  "L"  for  this  Item,  but 
this  example  shows  you  these  Items  should  be  completed. 

In  summary,  for  each  Item  you  will  place  an  “M"  for  iJost  effective  next  to  one 
response  alternative,  and  an  "L"  for  Least  effective  next  to  another  re.sponse 
alternative.  The  boxes  In  front  of  tne  rest  of  the  response  alternatives  will 
be  left  blank.  Please  use  only  one  "M*'  and  only  one  "L"  per  Item. 


Figure  9-5.  Situational  Judgment  Tast  Instructions. 
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SUPPLEMENTAL  INFOR^WTION 

Several  instruments  designed  to  obtain  supplemental  information  were 
included  in  .the  set  of  second- tour  measures: 

Army  Job  Satisfaction  Questionnaire.  The  Army  Job  Satisfaction  Ques¬ 
tionnaire  was  administered  to  both  first-tour  and  second-tour  soldiers. 

Job  History  Questionnaire.  A  Job  History  Questionnaire  was  included  in 
the  final  set  of  second-tour  criterion  measures.  This  instrument  is  the  same 
as  that  used  for  first-tour  soldiers  except  that  it  lists  the  tasks  selected 
for  second-tour  soldier  testing. 

Background  Information  Form.  As  with  the  first-tour  soldiers,  it  was 
necessary  to  gather  a  few  items  of  descriptiye  information  on  each  examinee 
(e.g.,  Social  Security  Number).  The  Background  Information  Form  deyeloped  for 
second-tour  soldiers  also  included  several  questions  related  to  the  extent  of 
the  examinee's  supervisory  experience. 

Measurement  Method  Rating.  Because  two  novel  testing  strategies  were  to 
be  incorporated  into  the  set  of  second-tour  criterion  measures,  a  Measurement 
Method  Rating  form  was  also  included.  This  form  Is  similar  to  the  one  used 
during  the  Concurrent  Validation,  but  was  modified  to  reflect  the  new  testing 
methods. 

A  list  of  the  complete  array  of  second-tour  measures  and  supplemental 
information  is  provided  in  Table  9-5. 


Table  9-5 

Second-Tour  Criterion  Measures  and  Supplemental  Information 


Criterion  Measures: 

Personnel  File  Form  II* 

Army-Wide  Performance  Rating  Scales  II 
MOS-Specific  Rating  Scales  II 
Combat  Performance  Prediction  Scales 
Supervisory  Simulation  Exercises 
Situational  Judgment  Test 
Hands-on  Tests  II 
Job  Knowledge  Tests  II 

Supplemental  Information: 

Background  Information  Form  II 
Army  Job  Satisfaction  Questionnaire 
Job  History  Questionnaire  II 
Measurement  Method  Rating  II 

*  "H"  indicates  that  this  version  is  specific  for  second-tour  soldiers. 
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Chapter  10 

LONGITUDINAL  VALIDATION  CRITERION  DATA  COLLECTION 


The  longitudinal  criterion  data  collection  began  in  July  1988  and  was 
completed  in  February  1989.  The  primary  purpose  of  the  data  collection  was  to 
test  first-tour  soldiers  who  had  taken  the  Experimental  Predictor  Battery  as 
they  entered  the  Army  (the  "LVI"  sample).  A  second  purpose  of  the  data 
collection  was  to  collect  second- tour  performance  data  (the  “CVII"  sample) 
from  soldiers  who  had  also  participated  in  the  Concurrent  Validation  (the 
"CVI"  sample).  As  with  the  Concurrent  Validation,  data  collections  were 
planned  for  13  CONUS  installations  and  USAREUR.  The  data  collection  schedule 
at  those  installations  is  shown  at  Table  10-1. 


Table  10-1 


LVI/CVII  Data  Collection  Test  Dates,  1988-89 


Post 

Dates 

Fort  Lewis 

11 

Jul-  5 

Aug 

Fort  Bragg 

18 

Jul-17 

Aug 

Fort  Riley 

19 

Jul-11 

Aug 

Fort  Hood 

25 

Jul -24 

Aug 

Fort  Ord 

6 

Sep-30 

Sep 

Fort  Bliss* 

15 

Sep-29 

Sep 

and 

9 

Jan-20  Jan 

Fort  Campbell 

3 

Oct-28 

Oct 

USAREUR 

10 

Oct- 16 

Feb 

Fort  Knox* 

11 

Oct-23 

Nov 

Fort  Sill* 

17 

Oct-28 

Oct 

Fort  Polk 

17 

Oct-10 

Nov 

Fort  Benning* 

14 

Nov-18 

Nov 

and 

5 

Dec-9  Dec  and  19  Dec-20  Dec 

Fort  Carson 

2 

Dec-16 

Dec 

Fort  Stewart 

3 

Jan-  3 

Feb 

*  Indicates  first  tour  only 


DATA  COLLECTION  PROCEDURES 
Advance  Coordination 


Advance  site  coordination  for  each  military  installation  was  accomp¬ 
lished  via  extensive  correspondence  (written  and  phone)  and  either  one  or  two 
test  site  visits.  The  first  site  visit  provided  briefings  to  post  commanders 
and/or  their  representatives  to  clarify  the  data  collection  objectives, 
activities,  and  requirements.  One  to  two  weeks  prior  to  the  actual  data 
collection,  project  staff  members  visited  the  installation  to  examine  the  test 
site  and  discuss  equipment,  supplies,  and  other  special  requirements  for  the 
data  collection  and  set-up  of  the  hands-on  test  stations. 
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Preceding  Page  Blank 


Using  updated  listings  from  the  Army's  Worldwide  Locator  Service,  post 
POCs  were  given  a  list  of  the  names  of  target  examinees  who  were  shown  to  be 
stationed  on  that  post.  The  POCs  used  this  list  to  identify  the  soldiers  whom 
they  needed  to  schedule  for  testing.  To  ensure  that  sufficient  data  from  each 
MOS  were  collected,  the  samples  were  augmented  with  additional  soldiers  who 
were  not  in  the  original  sample,  but  were  in  the  appropriate  MOS  with  the 
requisite  tine  in  service  to  make  them  comparable  to  the  characteristics  of 
the  target  examinees.  The  operational  definition  for  first*  and  second-tour 
soldiers  for  this  data  collection  wast  First- tour  soldiers  entered  the 
service  between  20  Aug  85  and  20  Nov  87;  second-tour  soldiers  entered  the 
service  during  the  period  1  Jul  83  to  30  Jun  84. 


Test  S1te.Staif,iiiq  jnti  Trainjaa 

Generally,  each  test  site  required  the  following  personnel! 


Test  Site  Manager  (TSM)  1 

Hands-on  Managers  (HOM)  2 

Hands-on  Assistants  2 

Paper-and-Pencil,  Rating  Scale,  and  S 

Role-Play  Administrators 


Additionally,  the  Army  posts  provided  eight  NCOs  per  MOS  to  administer  and 
score  hands-on  tests. 

Trainino  of  Primary  Staff.  Most  of  the  nonmilitary  test  site  staff  were 
permanent  employees  of  the  contractor  consortium.  However,  a  significant 
number  of  additional  primary  staff  had  to  be  hired  on  a  temporary  basis 
because  of  the  special  requirements  imposed  by  the  role-plays.  These 
additional  test  site  personnel  played  the  roles  of  problem  subordinates  in  the 
role-play  simulations  and  served  as  the  role-play  scorers.  Much  of  the 
training  for  in-house  staff  members  took  place  during  the  Concurrent  Valida¬ 
tion  and  the  second-tour  field  tests.  In  addition,  a  formal  training  program 
was  conducted  just  prior  to  the  start  of  the  LVI/CVII  data  collection  trips. 

In  preparation  for  the  formal  training  program,  three  manuals  were  con¬ 
structed;  (a)  a  Test  Administrator's  Manual,  (b)  a  Test  Site  Manager's 
Manual,  and  (c)  a  Hands-On  Manager's  Manual.  The  instructional  materials 
included  the  following  elements: 

0  Project  A  background 

e  Things  to  know  on  an  Army  post  (e.g.,  rank  insignia) 

•  Criterion  measure  administration  (including  dry  runs) 

t  Maintaining  integrity  of  tests  and  data 

The  training  materials  were  covered  in  a  2-day  training  session.  The 
individuals  who  were  designated  role  players  had  an  additional  3  days  of 
intensive  role-play  actor/scorer  instruction  (see  Figure  9-4). 

The  individuals  selected  for  TSMs  and  HOMs  were  generally  more 
experienced  than  the  ether  test  site  members.  The  HOMs,  particularly,  had  to 
be  familiar  with  the  equipment  and  procedures  involved  with  the  tests  they 
would  administer  for  each  MOS.  For  some  MOS,  such  familiarity  takes  a 
significant  amount  of  experience  to  acquire  because  of  factors  such  as 
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compltxlty  or  diversUy  of  equipment  that  Is  used.  A  good  example  is  Light 
Vehicle  Repairer  (MOS  63B).  An  HOM  for  this  HOS  must  be  familiar  with  many 
different  vehicles  so  that  when  the  requested  vehicle  for  a  task  test  is 
unavailable  (as  will  invariably  happen  from  time  to  time),  he  or  she  can 
specify  a  suitable  alternative. 

Hands-On  Scorer  Training.  Training  of  all  military  scorers  at  the  test 
sites  was  conducted  in  conjunction  with  the  actual  data  collection.  NCO 
scorers  for  each  NOS  received  from  1  to  2  days  of  hands-on  test  administration 
training  prior  to  the  test  administration  (one  day  for  first-tour  tests  plus 
one  day  for  second-tour  tests,  if  applicable).  This  training  was  provided  on 
an  HOS-specific  basis  by  the  HOM  for  that  MOS. 

The  training  followed  the  procedures  that  had  been  developed  for  the  CV 
data  collection  (Campbell,  198S).  This  program  is  designed  not  to  train  the 
NCOS  in  how  to  perform  the  tasks,  but  to  ensure  that  each  NCO  scorer  has  a 
fairly  high  degree  of  scoring  expertise  and  familiarity  with  the  task  tests. 

PftllY  LOfliitlgg 

The  schedule  for  administering  the  criterion  measures  was  arranged  so 
that  no  more  than  two  Batch  A  MOS  (first-  and/or  second-tour)  would  be 
assessed  on  a  given  day.  Batch  Z  testing  was  usually  conducted  on  days  when 
NCO  scorers  were  being  trained  to  administer  HO  tests  to  the  Batch  A 
examinees.  The  general  plans  for  administering  the  criterion  measures  to 
these  three  groups  of  examinees  (Batch  A  first  tour,  Batch  Z  first  tour.  Batch 
A  second  tour)  are  outlined  below.  Batch  A  testing  required  one  day  per 
examinee  and  Batch  Z  testing  required  one-half  day  per  examinee. 

All  test  administration  sessions  began  in  the  same  way.  The  examinees 
assembled  and  roll  was  taken  so  that  a  search  could  start  for  any  missing 
personnel.  A  project  staff  member  would  then  introduce  the  soldiers  to 
Project  A  and  review  the  activities  in  which  they  would  participate  throughout 
the  day.  The  Privacy  Act  was  read  aloud  to  the  soldiers  at  this  time. 

Soldiers  also  identified  those  individuals  for  whom  they  would  be  able  to 
provide  peer  ratings.  If  there  were  20  or  more  soldiers  in  a  Batch  A  MOS  or 
if  there  were  both  first-  and  second-tour  examinees  present,  the  total  group 
was  divided  appropriately  into  subgroups. 

Batch  A  First  Tour.  The  Batch  A  first-tour  assessment  schedule  is  shown 
in  Figure  10- I . The  HO  testing  was  set  up  to  process  a  maximum  of  20  soldiers 
in  a  4-hour  period.  Eight  NCO  scorers  were  needed  to  meet  this  schedule. 

Thus,  when  there  were  more  than  20  first-tour  soldiers  from  a  given  MOS  to  be 
tested,  they  were  divided  into  two  groups.  One  group  took  the  HO  tests  in  the 
morning  while  the  other  group  took  the  other  criterion  measures.  After  lunch, 
roll  was  taken  again  and  the  activities  of  the  two  groups  were  reversed.  The 
HO  tests  for  the  two  HOS  were  administered  in  separate  locations;  however,  the 
written  tests  and  ratings  were  often  administered  to  both  MOS  together.  This 
minimized  requirements  for  test  site  staff  personnel. 
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MOS  A  MCS  B 


Um 

LI 

LI 

fill 

fiifi 

073Q 

In-Processing 

In-Processing 

0800 

HO 

JK 

HO 

JK 

0900 

HO 

JK 

HO 

JK 

1000 

HO 

XI 

HO 

XI 

1100 

HO 

XI 

HO 

XI 

1200 

Lunch 

Lunch 

1300 

JK 

HO 

JK 

HO 

1400 

JK 

HO 

JK 

HO 

ISOO 

XI 

HO 

XI 

HO 

1600 

XI 

HO 

XI 

HO 

Ltgmdi  HO  • 


Htnda^  T«t> 

JK  >  Jab  KnaaladM  T«tt 
a  •  Nrwnnal  Fila  Infomtian  Fem  ' 

Jab  Hlitory  Quattlonntlr* 

Fwr  Mtlngi  (M/NOS-apaeine  IMS  I  Coitet  Scaltb) 
Fhyaleal  laquir— nta  lurvay 


NOTE:  This  schedule  assumes  four  groups  of  examinees  (maximum  n*20};  two 
groups  for  each  of  two  MOS. 


Figure  10-1.  Batch  A  first-tour  criterion  administration  schedule. 


Batch  A  Second  Tour.  On  days  when  second- tour  soldiers  were  being 
tested,  there  was  normally  one  group  of  first-tour  soldiers  and  one  group  of 
second-tour  soldiers  per  HOS.  The  general  test  administration  plan  that  was 
used  when  second-tour  examinees  were  Involved  Is  shown  In  Figure  10-2.  The 
second-tour  schedule  differs  from  the  first-tour  schedule  In  that  one-half  of 
the  day  was  devoted  to  a  combination  of  3  hours  of  HO  testing  and  1  hour  of 
supervisory  simulation  exercises,  and  the  other  one-half  day  was  devoted  to  a 
somewhat  different  combination  of  written  tests  and  ratings.  Specifically, 
the  time  devoted  to  the  job  knowledge  test  was  reduced  from  2  hours  to  1  hour 
to  make  time  for  the  1-hour  Situational  Judgment  Test. 


222 


1 

1st  Tour  2nd  Tour 

1st  Tour 

2nd  Tour 

Time 

1 

MQ5 

A  MOS  A 

MOS  B _ B0S..L-. 

0730 

In-Processing 

In-Proc 

ossing 

0800 

HO 

JK 

JK 

HO 

0900 

HO 

X2 

JK 

hO 

1000 

HO 

X2 

XI 

HO 

1100 

HO 

S 

XI 

HO 

1200 

Lunch 

Lunch 

1300 

JK 

HO 

HO 

JK 

1400 

JK 

HO 

HO 

X2 

1500 

Xi 

HO 

HO 

X2 

1600 

XI 

HOM 

HO 

SM 

Lagmdi  HO  «  Htndi*«n  T«ta 

X  ■  Job  Kimladgi  Twts 
S  -  SltuktloMl  Judnwit  T«t 
tx  •  Nnomil  F(1t  Infonatlpii  Fora 
Job  History  QuMtIonnsIro 
Job  Sstltfsctlon  (Mstlonralni 
Fmt  (Utlngt  (AH/WS-sncIfle  BMS  I  CoiUt  Scslot) 
riiyslcal  MMirHBRts  wviy 
X2  •  Nrsoimsi  Fllo  Inforvitlon  Foni 
Job  History  Quostlonnolrt 
Job  Sstlsfsctlon  Onostlonnalro 
Nor  tatlngs  (MI/N0S-stoc1f1c  MRS  I  Co^t  Scalos)  or 
MU 

N  >  Hsiiiirsasnt  Nsthod  RatlnBs 

NOTE:  This  schedule  assumes  four  groups  of  examinees  (maximum  n*20};  two 
groups  (one  first  tour,  one  second  tour)  for  each  of  two  MOS. 

Figure  10*2.  Batch  A  first -/second- tour  criterion  administration  schedule. 


There  was  an  expectation  that  a  significant  percentage  of  second-tour 
soldiers  would  not  be  able  to  provide  peer  ratings.  One  of  the  primary 
problems  Is  that  soldiers  at  this  level  often  work  much  more  autonomously  than 
their  first-tour  counterparts;  another  problem  Is  that  second-tour  soldiers 
were  tested  In  very  small  groups,  thus  decreasing  the  likelihood  that  there 
were  many  pairs  of  co-workers.  Plans  were  therefore  made  to  make  the  most  of 
the  time  that  examinees  not  making  peer  ratings  would  have  available.  The 
Project  A  biodata  predictor,  ABLE,  was  selected  as  the  Instrument  examinees 
would  complete  If  they  could  not  make  peer  ratings.  This  Instrument  was 
chosen  because  (a)  many  of  the  second-tour  examinees  would  be  supplemental 
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(I.e.,  no  Project  A  predictor  data  would  be  available  for  them),  and  (b)  the 
Army's  decision  to  implement  ABLE  made  it  a  prime  candidate  for  additional 
data  collection. 

Batch  7.  The  maximum  number  of  Batch  Z  soldiers  who  were  tested  at  one 
time  was  generally  30.  The  test  administration  schedule  appears  in  Figure 
10-3. 
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Figure  t  :-3.  Batch  Z  criterion  administration  schedule. 


Supervisor  Ratings.  The  goal  was  to  obtain  two  supervisor  ratings  for 
each  xjx.ifflinee.  Supervisor  raters  were  Identified  with  the  assistance  of  the 
examinees  and  the  NCO  support  staff.  One  of  the  project  staff  was  responsible 
for  coordinating  efforts  to  (a)  identify  the  supervisors,  (b)  schedule  rating 
administration  sessions  with  them,  and  (c)  administer  the  supervisory  rating 
sessions.  The  supervisory  rating  sessions  ran  concurrently  with  the  other 
data  collection  and  scorer  training  activities.  Supervisors  were  requested  to 
report  on  the  seine  day  as  their  subordinates. 

Asmiment  of  Interscorer  Agreement  (Hands-on  and  Role-Plavl 

Although  some  effort  was  devoted  to  assessing  hands-on  test  reliability 
in  early  Project  A  data  collection  efforts,  the  Information  was  inadequate 
for  providing  a  reasonable  assessment  of  the  interrater  reliability  of  these 
measures.  Consequently,  shadow- scoring  efforts  were  incorporated  into  the 
LVI/CVII  data  collection.  Interrater  reliability  estimation  efforts 
focused  on  the  first-tour  HO  tests  for  two  Batch  A  MOS  (IIB  and  91A). 
Collecting  shadow-scoring  data  for  these  two  MOS  was  arranged  at  several  data 
collection  sites  and  required  a  total  of  12  scorers  (instead  of  the  formally 
requested  eight)  for  each  of  these  MOS.  All  scorers  were  trained  to  run  two 
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of  the  eight  HO  testing  stations.  Four  extra  scorers  were  designated  as 
shadow- scorers,  and  they  followed  a  randomly  selected  subset  of  examinees  from 
station  to  station.  Thus,  for  a  subset  of  IIB  and  91A  examinees,  performance 
on  all  of  their  HO  tests  was  rated  by  two  scorers. 

Shadow-scoring  data  for  the  supervisory  simulations  were  also  collected 
at  test  locations  in  USAREUR.  This  was  possible  because  there  were  always  at 
least  four  trained  role-players  at  each  of  these  test  sites  and  only  three 
simulations  ware  being  conducted  at  any  one  time.  Thus,  one  Individual  was 
available  to  observe  one  of  the  ongoing  simulations  and  provide  an  independent 
sot  of  scores  for  the  examinee.  Again,  the  issue  was  whether  the  performance 
ratings  assigned  by  the  role-player  scorers  are  reliable  across  different 
scorers.  Pending  data  entry,  the  sample  size  and  analysis  results  are  not 
known. 


SAMPLE  SIZES 

Pending  data  entry,  exact  sample  sizes  are  unknown^  Table  10-2, 
however,  provides  reasonable  estimates  of  the  LVI/CVII  sample  sizes.  The 
figures  are  broken  down  by  installation,  MOS,  and  tour  (first  or  second). 


Table  10-2 

Project  A  LVX/CVZI  Estimated  Data  Collection  Totals 


FIRST-TOUR  SOLDIERS:  Batch  A 
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(Continued) 
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T«b1tf  10-2  (Cpntinucd) 

Projtct  A  LVI/CVII  0«ta  Collacilon  Totals 


SECOND -TOUR  SOLDIERS:  BATCH  A 
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BATCH  Z 
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Batch  A  First-Tour  total  6,820 

Batch  Z  Total  4,448 


Total  First  Tour  11,268 

Total  Second  Tour  1,053 

Grand  Total  li, jll 
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Chapter  11 
EPILOGUE 


This  final  statement  on  Project  A  work  begins  with  a  brief  history  of 
selection  and  classification,  to  characterize  the  context  and  sequences  in 
which  the  Project  A  research  has  been  performed.  The  chapter  closer  with  a 
sunmary  list  of  Project  A  products  and  results,  in  terms  of  both  scientific 
achievement  and  practical  application. 


A  BRIEF  HISTORY  OF  SELECTION  AND  CLASSIFICATION 


Formal  personnel  selection  and  classification  using  standardized 
measures  of  individual  differences  actually  began  in  1115  B.C.  with  the  system 
of  competitive  examinations  that  led  to  appointment  to  the  bureaucracy  of 
Imperial  China  (DuBois,  1964).  It  soon  included  the  selection/  classification 
of  individuals  for  particular  military  specialties,  o»  in  the  selection  of 
spear  throwers  with  standardized  measures  of  long-distance  visual  acuity 
(e.g.,  identification  of  stars  in  the  night  sky). 

Systematic  attempts  to  deal  with  selection/classification  issues  have 
been  a  part  of  military  management  ever  since.  Military  organizations  are 
virtually  unique  in  their  need  to  make  large  numbers  of  complex  personnel 
decisions  in  a  short  space  of  time.  However,  the  centrality  of  criterion- 
related  validation  to  a  technology  of  selection  and  classification  was  not 
fully  articulated  until  World  War  II,  and  research  and  development  sponsored 
by  tne  military  has  been  the  mainstay  of  growth  in  that  technology  from  then 
to  the  present. 


The  contributions  of  military  psychologists  during  World  War  II  are 
well-known  «nd  we 11 -documented.  The  early  work  of  the  Personnel  Research 
Branch  of  The  Adjutant  General's  Office  was  suimarized  in  a  series  of  articles 
in  the  Psychologic^  Bulletin  (Staff,  PRB,  AGO,  1943  a,  b,  c,  d,  e,  and  f). 
Later  work  was  published  in  Technical  Bulletins  and  in  such  Journals  as 
Psychometrika.  Personnel  Psychology,  and  Journal  of  Applied  PsYcholnov.  The 
AvTation  Psychology  Program  of  the Army  Air  Forces  issued  19  volumes,  with  a 
sunmary  of  the  overall  program  presented  in  Volume  I  (Flanagan,  1948).  In  the 
Navy,  personnel  research  played  a  smaller  and  less  centralized  role,  but  here 
too  useful  work  was  done  by  the  bureau  of  Naval  Personnel  (Stuit,  1947). 


Much  new  ground  was  broken.  There  were  important  advances  in  the 
development  and  analysis  of  criterion  measures;  Thorndike's  textbook  based  on 
his  Air  Force  experience  presented  a  state-of-the-art  classification  and 
analysis  of  potential  criteria  (Thorndike,  1949).  Improvements  were  made  in 
rating  scales.  Forced-choice  methods  were  developed  by  the  Personnel  Research 
Branch;  checklists  based  on  critical  incidents  were  used  in  the  AAF  program. 
The  sequential  aspect  of  prediction  was  articulated  and  examined;  tests 
'‘validated"  against  training  measures  (usually  pass/fail)  were  checked  against 
measures  of  luccess  in  combat  (usually  ratings  or  awards).  At  least  one 
"pure"  validity  study  was  accomplished,  when  the  Air  Force  sent  1,000  cadets 
into  pilot  training  without  regard  to  their  pilot  stanine  derived  from  the 
classification  battery.  This  remains  one  of  the  few  studies  that  could  report 
validities  without  correcting  for  restriction  of  range.  Historically,  1940  to 
1946  was  a  period  of  concentrated  development  of  selection  and  classification 
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procedures,  and  the  further  accomplishments  of  the  next  several  decades  flowed 
directly  from  it. 

In  part,  this  continuity  is  attributable  to  the  well-known  fact  that 
many  or  the  psychologists  who  had  worked  in  the  military  research  establish¬ 
ments  during  tne  war  became  leaders  in  the  civilian  research  community  after 
the  war.  In  part,  it  is  attributable  to  the  less  widely  recognized  fact  that 
the  bulk  of  the  work  continued  to  be  funded  by  military  agencies.  The  Office 
of  Naval  Research,  the  Army's  Personnel  Research  Branch  (and  its  successors), 
and  the  Air  Force  Human  Resources  Research  (HRR)  installations  were  the 
principal  sponsors. 

The  bibliography  is  very  long.  Of  special  relevance  to  the  present 
project  is  the  pioneering  work  on  differential  prediction  by  Brogden  (1946a, 
1951)  and  Horst  (1954,  1955):  on  utility  conceptions  of  validity  by  Brogden 
(l946b)  and  Brogden  and  Taylor  (1950):  on  the  "structure  of  intellect"  By 
Guilford  (1957):  on  the  establishment  of  critical  Job  requirements  by  Flanagan 
and  associates  (Flanagan,  1954);  and  on  the  decision-theoretic  formulations  of 
selection  and  classification  developed  by  Cronbach  and  Gleser  (1957)  for  the 
Office  of  Naval  Research,  The  last  of  these  (Psychological  Tests  and  Person¬ 
nel  Oecisionsl  was  hailed  quite  appropriately  as  a  breakthrough— a  ^new  look" 
in  selection  and  classification— but  the  authors  were  the  first  to  acknowledge 
the  relevance  of  the  work  of  Brogden  and  Horst  cited  above.  It  was  the 
culmination  of  a  lengthy  sequence  of  development. 

Project  A  was  carried  out  in  the  context  of  this  impressive  history,  and 
it  has  become  another  milestone.  It  is  by  far  the  most  comprehensive  person¬ 
nel  research  and  development  project  ever  attempted.  It  is  unique  in  that  a 
complete  personnel  system  is  being  examined  at  one  time.  The  Jobs  (MOS)  to  be 
studied  were  sampled  representatively  from  the  complete  population,  now 
predictor  measures  were  sampled  systematically  from  the  complete  domain  of 
potential  information,  and  Job  performance  was  assessed  as  thoroughly  as 
possible  with  multiple  measures.  Given  this  data  base,  and  using  state-of- 
the-art  analytic  techniques,  the  functioning  of  the  complete  selection/classi¬ 
fication  decision  process  can  be  modeled  and  actually  evaluated  under  various 
goals  or  constraints.  Project  A  is  truly  a  landmark  in  personnel  research. 

PROJECT  A  PRODUCTS  AND  RESULTS 

The  Project  A  products  in  the  following  list  are  of  two  general  kinds— 
products  for  the  "science*  (personnel  research)  and  products  for  the  organiza¬ 
tion  (the  Army).  The  list  is  intended  to  move  from  the  scientific  to  the 
applied.  However,  the  distinction  is  not  always  easy  to  make  since  many 
products  are  useful  for  both. 

(1)  There  exist,  in  technical  report  form,  comprehensive  reviews  of 
all  validity  evidence  pertaining  to  selection  and  classification 
for  skilled  Jobs.  These  zre  tho  most  comprehensive  such  reviews 
ever  done. 

(2)  The  question  of  whether  the  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB)  does  or  does  not. predict  Job  performance  (in 
addition  to  training  performance)  has  been  answered  definitively, 
in  the  affirmative.  The  Army  and  the  Department  of  Defense  are 
now  in  a  firmer  position  to  support  their  quality  goals.  In 
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addition,  It  Is  now  known  what  aspects  of  performance  ASVAB 
predicts  best  and  which  aspects  of  performance  could  be  predicted 
better  with  other  types  of  selection  Instruments. 

(3)  A  set  of  new  experimental  tests  has  been  developed  to  measure  non- 
cognltlve,  psychomotor,  perceptual,  and  cognitive  characteristics 
that  are  not  now  measureo  by  the  ASVAB.  The  scope  of  Project  A 
made  It  possible  to  examine  virtually  the  entire  domain  of 
selection  Information,  sample  from  It,  and  Investigate  the  basic 
Incremental  validity  produced  by  each  major  piece  of  Information. 

(4)  Using  much  more  comprehensive  samples  than  ever  before,  new  ASVAB 
Aptitude  Area  composites  have  been  developed  which  are  firmly  data 
based  and  empirically  defensible. 

(5)  The  results  of  an  expert  Judgment  study  of  expected  correlations 
between  predictor  constructs  and  performance  factors  are  avail¬ 
able.  In  brief,  a  large  sample  of  personnel  experts  considered 
the  population  of  predictor  and  criterion  variables  appropriate 
for  entry-level  Jobs  and  forecasted  what  the  validity  coefficients 
would  be.  The  consistency  In  the  Judgments  and  their  correspon¬ 
dence  with  known  data  points  make  these  a  potentially  valuable 
tool  for  future  test  selection  and  synthetic  validation  work. 

(6)  Much  has  been  learned  about  the  nature  of  performance  In  entry- 
level  skilled  Jobs  (e.g.,  first-tour  MOS).  We  now  have  a  much 
clearer  Idea  of  what  major  factors  constitute  performance  and  how 
they  can  be  measured.  The  "criterion  problem"  Is  better  under¬ 
stood.  This  knowledge  base  should  better  Inform  future  enlistment 
and  promotion  policy,  as  well  as  future  personnel  research. 

(7)  The  Concurrent  Validation  data  support  the  assertion  that  super¬ 
visor  ratings  of  subordinate  performance  have  considerable 
construct  validity  If  a  careful  measurement  procedure  Is  followed. 
The  data  also  support  the  conclusion  that  supervisors  seem  to 
assess  both  the  technical  performance  of  Individuals  and  their 
general  dependablllty/motivatlon  at  the  same  time. 

(8)  Within  the  limits  of  the  Concurrent  Validation  design,  the 
Incremental  validity  of  appropriate  ABU  scales  for  predicting  the 
"will  do"  components  of  performance  has  been  demonstrated. 

(9)  The  potential  of  the  AVOICE  for  differentially  predicting  "can  do" 
performance  In  combat  vs.  technical  vs.  administrative  support  MOS 
has  been  established.  What  Is  needed  to  make  this  finding 
operational  Is  empirical  scoring  keys. 

(10)  The  Project  A  Job/task  analysis  procedures  worked  well  and  can  be 
used  by  the  Army  In  the  future  to  develop  training  curricula. 

Skill  Qualification  Test  (SQT)  content,  performance  measures,  and 
field  exercises.  The  job  analysis  summaries  for  each  MOS  serve  as 
a  model  for  future  job  analysis  work  In  the  Army  as  well  as  In  the 
public  and  private  sectors. 
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(11)  Advanced  Individual  Training  (AIT)  achievement  measures  have  been 
developed  for  21  HOS.  The  training  measures  will  dllov<  a  deter¬ 
mination  of  whether  training  performance  predicts  Job  performance, 
and  whether  it  does  so  differentia! ly  for  different  groups  of 
trainees  (race,  gender),  and  different  groups  of  MCS  (combat, 
combat  support,  combat  service  support). 

(12)  The  package  of  rating  scale  administration  procedures  can  be  used 
in  future  personnel  research  In  the  Army.  A  major  effort  In  the 
Project  A  research  was  to  develop  an  effective  and  efficient  set 
of  procedures  for  administering  performance  rating  scales  to  large 
numbers  of  people.  These  procedures  and  the  package  of  materials 
can  be  adapted  for  use  In  other  Army  personnel  research  where 
ratings  of  many  persons  are  required. 

(13)  The  Supervisory  Description  Questionnaire  (which  came  out  of 
second-tour  Job  analyses  work)  Is  a  useful  Inslvument  for  future 
work  In  the  design  of  leadership  training  or  the  evaluation  of 
leadership/supervisor  performance.  The  questionnaire  is  based  on 
a  clear  rationale  and  Is  straightforward  to  use. 

(14)  Project  A  developed  a  common  utility  stale  for  making  comparisons 
across  MOS  and  performance  levels  witnln  MOS.  Although  it  doe' 
uot  speak  to  marginal  utility  Issues,  It  can  be  used  to  enhance 
the  comparison  of  alternative  selectlon/classltication  procedures. 

(15)  One  very  real,  and  very  Important  product.  Is  the  froject  A  data 
base  Itself.  It  Is  by  orders  of  magnitude  the  largest  and  most 
completely  documented  personnel  research  data  base  in  existence. 
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Appendix  A 

CHARACTERISTICS  OF  ARMY  PERSONNEL  SYSTEM 
(Otirrlbed  as  of  February  1989} 

The  major  stages  of  the  selection,  classification,  and  assignment 
process  for  persons  entering  enlisted  service  In  the  Army  are  presented  In 
Table  A>1.  The  size,  diversity,  and  widespread  Geographical  distribution  of 
Army  activities  have  long  dictated  that  the  Initial  stages  of  personnel 
recruitment,  selection,  classification,  and  training  be  performed  across  many 
specialized  units  or  activities  and  by  personnel  who  have  been  specifically 
trained  for  these  functions  with  guidance  from  command.  Certain  other 
functions  are  both  formalized  and  carried  out  at  the  command  level.  These 
Include  unit  or  on«the-job  training;  performance  evaluation;  and  decisions  (or 
recommendations)  concerning  promotion,  discipline,  reassignment,  and  retention 
or  separation  from  service.  The  major  stages  of  the  process  as  of  February 
1989  are  discussed  below. 


R9cru.1tii»,at 

It  Is  difficult  to  discuss  recruitment,  selection,  and  classification 
separately.  They  are  Interdependent  processes.  Their  complementary  nature 
should  be  evident  In  the  ensuing  discussion. 

The  Army  has  succeeded  In  meeting  or  approximating  Its  numerical 
recruitment  quotas  In  most  of  the  years  following  the  cnanae  to  an  All* 
Volunteer  Force,  resulting  In  an  annual  average  of  about  120,000*140,000 
enlisted  accessions  from  over  twice  as  many  applicants  In  the  preceding  10 
fiscal  years.  Furthermore,  many  qualified  applicants  do  not  enter  active  duty 
Immedlatelv  but  enter  the  delayed  entry  program  (DEP)  where  they  await  a 
training  siot. 

The  Army  seeks  to  recruit  the  most  capable  personnel.  Quality  Is 
generally  defined  In  terms  of  high  school  graduation  status  and  average  or 
above  scores  on  the  Armed  Forces  Qualification  Test  (AFQT).  The  AFQT  Is  a 
composite  of  four  subtests  (comprising  verbal  and  math  content)  from  the 
overall  selection  and  classification  instrument,  the  Armed  Services  Vocational 
Aptitude  Battery  (ASVAB).  AFQT  scores  are  reported  In  percentiles  relative 
to  the  national  youth  population.  For  convenience,  they  are  grouped  Into  the 
following  categories  and  subcategorlest 


AFQT  Category 
I 

11 

IIIA 

IIIB 

IVA 

IVB 

IVC 

V 


Percentile  Score  Range 

93  *  100 
65  *  92 
50  •  64 
31  *  49 
21  *  30 
16  -  20 
10  *  IS 
1  *  9 


A*1 


T«b1i  A-1 


Til*  Arvy  Saltctlon,  CUttIflettlon,  «nd  Evaluation  Procass 


Staae/Actlvltv 

£CSSUi 

Miefiit 

Recruitment 
(U.S.  Army 

Recruiting  Command) 

0  Recruiting  Incentives, 
Options 

0  Recruiter  Interviews 

0  Aptitude  Pre-Screen Ing 
Test  (EST)  (CAST) 

0  Records  Checks 

0  To  MET  Sites 
or  MBPS 

0  Disqualified 

Selection/ 

Classification 

(HEPS) 


0  Aptitude  Testing  (ASVAB)  o  To  Training  Center 

0  physical  Exam  (PULHES)  o  Disqualified 

0  Moral  Screening 

0  Special  Tests 

0  Sklll/Tralnlng  Counseling 

0  Classification 


Entry  Training 

0  Basic  Combat  Training 

0  To  units 

(Army  Training 

c  Individual  Training 

0  Reassigned/ 

Centers  (  Schools) 

0  Training  Evaluation 

0  Assignment 

0  Disciplinary  Reviews 

0  Special  Courses 

Recycled 

0  Discharged 

(KRl,  etc.) 

First  Term 
(Operating  Units) 


0  Unit  (ort-the-job)  Train-  o 

Ing  and  Mission  Activities 
0  Special  Courses  o 

(NRI,  etc.) 

0  Evaluatlon-SQT  Ratings,  o 

Disciplinary  Reviews  o 

0  Promotion  Eligibility 

0  Reenllstment  Counseling 

and  Screening 

0  Army  Continuing  Education 

System 


Promotion/ 
Demotion 
Discharged 
(prior  to  ETS) 
Separation  (ETS) 
Reenllstment 


Second  Term 
(Operating  Units) 


0  Unit  Training  and  Mission 
Activ )ty 

0  Advanced  Technical/ 
Leadership  Training 
0  Evaluation 
0  Promotion  Eligibility 


0  Promotion/ 
Demotion 
0  Reassigned 
0  Discharged 
(prior  to  ETS) 

0  Separation  (ETS) 
0  Reenllstment 


I 


Categories  I  end  11  signify  well -ebovc  end  above  average  trainability, 
respectively.  Cstegory  III  denotes  averace  trainability,  and  Category  IV 
signifies  below  average  trainability.  Individuals  scoring  within  Category  V 
are,  by  law,  ineligible  for  enlistment.  Because  of  their  likelihood  of 
success  In  training  (and  now  with  eviuence  of  the  AFQT's  relationship  to  job 
performance),  the  Army  attempts  to  maximize  the  recruitment  of  those  scoring 
within  Categories  I  through  IlIA.  In  audition,  because  traditional  high 
school  graduates  are  more  likely  to  complete  their  contracted  enlistment 
terms,  in  contrast  to  nongraduates  end  alternative  credential  holders  (e.g., 
GEO  credential  holders),  they  are  most  actively  recruited. 

Though  qualification  for  initial  enlistment  Into  the  Army  is  based  upon 
a  number  of  criteria  (including  age,  moral  standards,  and  physical  standards), 
education  and  particularly  aptitude  are  the  criteria  that;  are  most  pervasive 
and  most  scrutinized.  The  Army  tries  to  target  its  advertising  and  aim  its 
recruiting  resources  so  as  to  attract  quality  recruits.  As  a  means  of 
identifying  ncruitment  prospects,  while  offering  a  career  guidance  tool,  the 
ASVAB  is  administered  to  900, OCO  high  school  juniors  and  seniors  annually  as 
part  of  the  DoO  .Student  Testing  Program. 

In  order  to  meet  numerical  requirements  and  budget  constraints,  the  Army 
has  recruited  some  non*high  school  graduates  and  applicants  scoring  in  AFQT 
Category  IV.  And,  between  1976  and  1980,  as  a  resuU  of  the  ASVAB  misnorming 
the  Army  erroneously  enlisted  high  proportions  cf  these  less-preferred 
recruits.  This  situation  raised  concerns  in  Congress,  and  led  to  the 
imposition  of  ceilings  on  the  proportion  of  non-high  school  graduates  and 
Category  IVs  who  may  be  enlisted.  One  of  the  outcomes  of  Project  A  will  be  a 
much  more  solid  empirical  basis  for  qualification  decisions.  In  fact,  this 
research  is  particularly  timely,  given  indications  that  banner  recruiting 
times  have  tapered  off. 

To  compete  with  the  other  Services  and  with  the  private  sector  for  the 
prime  target  group,  the  Army  has  had  to  offer  a  variety  of  specUl 
inducements,  including  "critical  skill"  bonuses  and  educational  incentives.  A 
popular  Inducement  has  been  the  "training  of  choice"  enlistment  to  a  specific 
school  training  program,  provided  that  applicants  meet  the  minimum  aptitude 
and  educational  standards  and  other  prerequisites,  end  that  training  "slots" 
are  available  at  the  time  of  their  scheduled  entry  into  the  program. 

Additional  options,  offered  separately  or  in  combination  with  "training  of 
choice,"  include  guaranteed  initial  assignment  to  particular  commands,  units, 
or  bases,  primarily  in  the  combat  arms  or  in  units  requiring  highly  technical 
skills.  In  recent  years,  a  large  proportion  of  all  Armv  recruits, 
particularly  in  the  preferred  aptitude  and  educational  categories,  has  been 
enlisted  under  one  or  more  of  these  options.  An  important  research 
contribution  would  be  to  provide  counselors  with  improved  data-basad  aids  to 
help  create  optimal  person- job  choices  in  light  of  Army  manpower  .leeds. 

The  importance  of  aptitude  in  recruiting  decisions  is  exemplified  in  the 
prescreening  of  applicants  at  the  recruiter  level.  For  applicants  who  have 
not  previously ‘taken  the  ASVAB  and  whose  edur.ational/aptituda  qualifications 
appear  to  be  marginal  based  on  the  Army's  trainability  standa'*as,  the 
recruiter  may  administer  a  short  Computerized  Adaptive  Screening  Test  (CAST) 
or  Enlisted  Screening  Test  (EST)  to  assess  the  applicant's  prospects  of 
passing  the  ASVAB.  The  Army  has  also  employed  non-cognitive  tests  to  identify 
individuals  who  are  likely  to  be  poor  risks  in  terms  of  the  probability  of 
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completing  of  Army  basic  training.  Applicants  who  appear,  upon  initial 
recruiter  screening,  to  have  a  reasonable  chance  of  cuallfying  for  service  are 
referred  either  to  one  of  759  Mobile  Examining  Team  (MET)  sites  for 
administration  of  the  ASVAB,  or  directly  to  a  Military  Entrance  Processing 
Station  (HEPS)  where  all  aspects  of  enlistment  testing  are  conducted. 

Selection  and  Classification  at  the  HEPS 

Based  on  the  Information  assembled,  classification  and  assignment  to  a 
particular  training  activity  are  completed  at  the  MEPS  for  applicants  found 
qualified  for  enlistment. 

The  current  versions  of  the  ASVAB  (Forms  ll-lS)  consist  of  the  following 
10  subtests i 


1.  Arithmetic  Reasoning 

2.  Numerical  Operations 

3.  Paragraph  Comp'^ehension 

4.  Word  Knowledge 

5.  Coding  Speed 

6.  General  kience 

7.  Mathematics  Knowledge 

8.  Electronics  Information 

9.  Mechanical  Comprehension 

10.  Automotive -Shop  Information 

In  addition  to  AFQT  scores,  subtest  scores  are  combined  to  form  10 
aptitude  composite  scores,  based  on  those  combinations  of  subtests  that  have 
been  found  to  be  most  valid  as  predictors  of  successful  completion  of  the 
various  Army  school  training  programs.  For  example,  the  composite  score  for 
administrative  specialties  is  based  on  the  numerical  operations,  paragraph 
comprehension,  wurd  knowledge,  and  coding  speed  subtests.  The  composUe  score 
for  electronics  specialties  is  based  on  a  combination  of  the  scores  for 
arithmetic  reasoning,  general  science,  mathematics  knowledge,  and  electronics 
information. 

As  stated  above,  eligibility  for  enlistment,  in  terms  of  the 
trainability  standard,  is  Eased  upon  a  combination  of  criteria;  AFQT  score, 
aptitude  area  composite  scores,  and  whether  the  applicant  is  or  is  not  a  high 
school  diploma  graduate.  Under  the  most  recent  Army  regulation*,  the 
following  standards  were  in  effect: 


High  school  graduates  are  eligible  if  they  acnieve  an  AFQT 
percentile  score  of  16  or  higher  and  a  standard  score  of  85  in  at 
least  j2iii  aptitude  area. 

GED  high  school  eguivalency  holders  are  eligible  if  they  achiuve 
an  AFQT  percentile  score  of  31  or  higher  and  a  standard  score  of 
85  in  at  least  one  aptitude  ares. 


'Army  Regulation  601-201,  1  October  1980,  revised,  Table  2-2. 
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-  Non-hiah  school  graduates  are  eligible  only  if  they  achieve  an 

AFQT  percentile  score  of  31  or  higher  and  standard  scores  of  85  In 
at  least  aptitude  areas. 

Physical  standardi*  are  captured  in  the  PULHES  profile,  which  rates 
the  applicant  on  General  Physical  (P),  Upper  torso  (U),  Lower  torso  (L), 
Hearing  (H),  Eyes  (E),  and  Psychiatric.  The  Army  also  sets  general  height  and 
weight  standards  for  enlistment. 

The  overwhelming  majority  of  Army  enlistees  enter  the  Army  under  a 
specific  enlistment  option  that  guarantees  choice  of  initial  school  training, 
career  field  ass1gnr.ient,  unit  assignment,  or  geographical  area.  For  these 
applicants,  the  initial  classification  and  training  assignment  decision  must 
be  made  prior  to  entry  into  service.  This  is  accomplished  at  HEPS  by 
referring  applicants  who  have  passed  the  basic  screening  criteria  (aptitude, 
physical,  moral)  to  an  Army  guidance  counselor,  whose  responsl-bllny  is  to 
match  the  applicant's  qualifications  and  preferences  to  Army  current  skill 
training  requirements,  and  to  make  "reservations"  for  training  assignments, 
consistent  with  the  applicant's  enlistment  option. 

For  the  enlistee,  this  decision  will  determine  the  nature  of  his  or  her 
Initial  training  and  occupational  assignment,  future  military  work  environ¬ 
ment,  and  chances  of  successful  advancement  In  an  Army  career.  For  the  Army, 
the  relative  success  of  the  assignment  process  will  significantly  determine 
the  aggregate  level  of  performance  and  attrition  for  the  entire  force. 

The  classification  and  training  "reservation"  procedure  Is  accomplished 
by  the  Recruit  Quota  System  (REQUEST;  which  was  Implemented  In  19/3.  REQUEST 
Is  a  computer-based  system  designed  to  coordinate  the  Information  needed  to 
reserve  training  slots  for  volunteers.  REQUEST  uses  minimum  qualifications 
for  accessions  control.  Thus,  to  the  extent  that  an  applicant  may  minimally 
qualify  for  a  wide  range  of  courses  or  specialties,  based  on  aptituJe  test 
scores,  the  initial  classification  decision  Is  governed  by  (a)  his  or  her  own 
stated  preference  (often  based  upon  limited  knowledge  about  the  actual  job 
content  and  working  conditions  of  the  various  military  occupations),  (b)  the 
availability  of  training  slots,  and  (c)  the  current  priority  assigned  to 
filling  each  military  occupational  specialty  (MOS). 

These  interactions  among  recruitment,  selection,  and  classification  in 
the  current  Army  system  give  rise  to  several  issues.  First,  there  is  an 
evident  need  for  decision-making  algorithms  designed  to  maximize  the  overall 
utility  of  the  MOS  asiigniaents.  This  requires  that  the  average  differential 
utilities  of  alternntiva  assignments  be  known,  as  well  as  the  marginal  utility 
of  each  additional  assignment  to  an  MOS.  The  Army  system  currently 
incorporates  marginal  utilities  by  specifying  desired  distributions  of  AFQT 
scores,  which  are  termed  quality  goals.  In  general,  tne  parameters  of  recruit 
supply  and  demand  (e.g.,  number  of  applicants  in  various  categories,  selection 
ratio,  percentage  of  training  slots  filled,  MOS  priority)  must  also  be  taken 
into  account  when  developing  decision-making  algorithms  for  selection  and 
classification.  The  decision  process  must  also  allow  for  the  potentially 
adverse  impacts  on  recruitment  if  the  enlistee's  Interests,  work  values,  and 
preferences  are  not  given  sufficient  weight.  There  are  clear  trade-offs  that 
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must  be  evaluated  between  the  procedures  necessary  to  (a)  attract  qualified 
people,  and  (b)  put  triem  into  the  right  slots. 

Initial  Training 

After  processing  at  a  Reception  Battalion,  all  non-prior  service  Army 
recruits  are  assigned  to  a  basic  training  program  (BCT)  of  8  weeks  which  Is 
followed,  with  few  exceptions,  by  a  period  of  advanced  Individual  training 
(AIT),  designed  to  provide  basic  entry-level  skills.  Entrants  Into  the  combat 
arms  and  the  military  police  receive  both  their  basic  training  and  their  AIT 
at  the  same  Army  base  (One  Station  Unit  Training)  In  courses  of  about  3-4 
months'  total  duration.  Those  assigned  to  other  specialties  are  sent  to 
separate  Army  technical  schools  whose  course  lengths  vary  considerably, 
depending  upon  the  technical  complexity  of  the  MOS.  The  diversity  of  course 
offerings  Is  illustrated  by  the  fact  that  the  Army  provides  initial  skills 
training  In  about  240  separate  courses.* 

In  contrast  to  earlier  practice,  most  enlisted  trainees  do  juti  current¬ 
ly  receive  school  grades  upon  completion  of  their  courses,  but  are  evaluated 
under  Pass/Fall  criteria.  Those  Initially  falling  certain  portions  of  a 
course  are  recycled.  The  premise  Is  that  slower  learners,  given  sufficient 
time  and  effort  under  self-paced  programs,  can  normally  be  trained  to  a 
satisfactory  level  of  competence,  and  that  this  additional  training  Investment 
Is  cost-effective.  Those  who  continue  to  fall  the  course  may  be  reassigned  to 
other,  often  less  demanding  specialties  or  discharged  from  service.  One 
consequence  of  these  practices  Is  to  limit  the  usefulness  of  the  selection/ 
classification  practices  as  predictors  of  later  performance. 

EirfarmanM  1n  AapyJalu 

Upon  assignment  to  an  Army  unit,  most  of  the  personnel  actions  affecting 
the  career  of  the  first-term  enlistee  are  Initiated  by  his  or  her  Immediate 
supervisor  and/or  the  unit  conmander.  These  include  the  nature  of  the  duty 
assignment,  the  provision  of  on-the-job  or  unit  training,  and  assessments  of 
performance,  both  on  and  off  the  job.  These  assessments  In-fluence  such 
decisions  as  promotion,  future  assignment,  and  eligibility  for  reenlistment, 
as  well  as  possible  disciplinary  action  (Including  early  discharges  from 
service). 

To  assure  that  these  processes  are  administered  fairly  and  consistently, 
In  a  manner  compatible  with  broader  Army  objectives,  the  various  aspects  of 
enlisted  personnel  management  are  governed  by  detailed  Army  regulations.  Army 
Regulation  600-211,  The  Enlisted  Personnel  Management  System,  and  related 
regulations  cover  such  subjects  as  enlisted  personnel  evaluation  and 
promotion,  while  AR  601-260,  The  Army  ReenVIstment  Program,  prescribes  the 
qualifications  for  reenlistment. 

During  an  Initial  3-year  enlistment  term,  the  typical  anllstea  can 
expect  to  progress  to  pay  grade  E-4,  although  advancement  to  higher  pay  grades 
for  specially  qualified  personnel  Is  not  precluded.  Authority  to  promote 
qualified  personnel  up  to  grade  E-4  is  delegated  to  unit  commanders;  promotion 


*Department  of  Defense,  Military  Manpower  Training  Report  for  1982,  March 
1981,  p.  II-4.  * 
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to  higher  grades  Is  numerically  restricted  and  must  be  approved  either  by 
field  grade  commanders  for  grades  E-5  and  E-6  or  by  HQDA  for  grades  E-7 
through  E>9.  Promotion  to  E«2  is  almost  au^jmatic  after  6  months  of  service. 
Promotions  to  grades  E-3  and  £<>4  normally  require  completion  of  certain 
mininium  periods  of  service  (12  and  24  months,  respectively),  but  are  subject 
to  certain  numerical  strength  limitations  and  specific  commander  approval. 

Unit  commanders  also  have  the  authority  to  reduce  assigned  soldiers  in  pay 
grade,  based  on  misconduct  or  inefficiency. 

The  Enlisted  Evaluation  System  provides  for  an  evaluation  both  of  the 
soldier's  proficiency  in  his  or  her  MOS  and  of  overall  duty  performance.  The 
process  includes  a  subjective  evaluation  based  on  supervisory  performance 
appraisal  and  ratings  that  are  conducted  at  the  unit  level  under  prescribed 

Srocedurei,  and  an  objective  evaluation  based  on  the  results  of  a  Skill 
ualification  Test  (SQT).  The  latter  is  a  criterion-referenced,  paper-and- 
pencil  performance-knowledge  test  which  evaluates  the  soldier's  ability  to 
perform  critical  job  tasks  satisfactorily.  The  responsibility  for  planning 
and  developing  the  SQT  and  of  validating  its  results  lies  with  the  U.S.  Army 
Training  Support  Center  of  the  Training  and  Doctrine  Command  (TRADOC);  actual 
administration  of  the  tests  has  been  delegated  to  each  of  the  major  Army 
commands . 

The  current  SQTs  are  developed  primarily  by  individuals  (e.g.,  enlisted 
personnel,  officers,  and  civilians)  who  are  knowledgeable  about  task  elements 
and  performance  requirements  but  are  not  trained  as  test  designers. 

tollUiiUDt  ..imtsfiJiifl 

The  final  stage  of  personnel  processing  of  first-term  enlisted  personnel 
is  screening  for  reenlistment  eligibility  which,  as  described  In  AR  601-2!i0, 
considers  such  criteria  as  disciplinary  records;  aptitude  area  scores  (hased 
on  ASVA8  or  its  predecessors);  low  SQT  scores,  when  applicable;  and  slow  grade 
progression  "resulting  from  a  pattern  of  marginal  conduct  and/or  performance." 
Enlisted  personnel  who  do  not  meet  certain  minimum  standards  under  these 
criteria  must  be  approved  by  Commanding  General  of  the  Personnel  Command, 
before  being  processed  fer  reenlistmunt. 

The  cumulative  losses  due  to  ettrition,  reenlistment  screening,  and  non- 
reenlistment  of  eligible  personnel  have  resulted  in  the  progressive  diminution 
of  initial  Army  cohorts  to  about  20-30  percent  of  their  original  numbers  by 
the  time  they  enter  the  fourth  year  of  enlisted  service.  Moreover,  not  all  of 
the  group  that  remains  are  retained  or  wish  to  be  retained  in  their  original 
specialties,  since  an  offer  of  retraining  is  often  an  Inducement  tor 
reenlistment.  The  cumulative  impact  of  this  skill  drain  upon  the  Army  is 
considerable. 

SUIWMCY 

Even  this  brief  description  of  the  system  illustrates  the  complexity  of 
the  Army's  personnel  decision-making  requirements  and  the  large  number  of 
parameters  that  must  be  taken  into  account.  In  addition,  decisions  must  be 
made  for  a  very  large  flow  of  individuals  within  a  very  short  time  frame.  In 
this  regard  the  Army  faces  a  much  more  difficult  personnel  management  task 
than  virtually  any  other  organization.  More  effective  selection/classifi¬ 
cation/promotion  strategies  would  pay  large  dividends. 
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Clalna  lalak 
Oannis  laltslay 
larbara  larbosa 
Iruea  largo 
Jaff  lamas 
Prod  lamatta 


1111  laylor 

Janas  lackhan 

Tracy  Innton 
Cindy  Nrash 
Tharata  lorry 
yhllla  lobko 
Wally  loman 
Nika  losshardt 


Joyca  lowars 
Tad  lowlar 
Kichard  Iradflald 
Allan  Iraawall 
David  Irandt 
Tin  Irenson 
Kinoarly  Iroeks 
Diana  Irown 
Malinda  Irowna 
Susanna  Iroamo 
Sandra  luanahora 


lobblo  Cadwall 
Charallna  Cain 
Char lotto  Canpball 
John  Canpball 
Kinborly  Canpball 
Roy  Canpball 
Dava  Canpshura 
Cay  Cam 

Miehaol  Carrigan 
Cary  Cartar 
A1  Castalll 
Jana  Challbtrg 
Wal  Jing  Chin 
John  Claudy 


Prajnet  A  la^laynas 


Brady  Coats 
Stava  Coats 
Adrianna  Colalla 
Ronald  Coltrass 
John  Connors 
Ten  Cook 
Vy  Vy  Corpa 
Hanry  Cunts 
Jannifor  Crafts 
Pan  Croon 
Linda  Culp 
Oanita  Curry 
Mark  Caamolawskl 

Arnold  Oanlals 
J.  L.  Otrrah 
Brag  Davis 
Bona  Davis 
Ralph  Davis 
lob  Davis 
Rosanary  Dawson 
Robin  Doan 
Tonny  doBron 
Katia  Oalaplana 
Donna  Danard 
Varna 11  Oannan 
Natalie  Dapp 
Oarrrn  Dlcaans 
Ani  DIFulo 
Danisa  Dinnan 
Bwondelyn  Dixon 
Tharasa  Doty 
Kin  Downing 
Jack  Doyle 
Elliabath  Ooyls 
Eueono  Druckar 
Dulols,  Dava 
Mary  Duffy 
Mika  Dunn 
Ml 11  Ian  Dunn 
Marv  Dunnatto 

Kant  Eaton 
Jo  Eduards 
Ed  Elsnar 
Ton  Elssanbarg 
Laon  Eldar 
Ernasto  Endaya 
Nnita  Evaro 

Chris  Palkar 
Dan  Falkar 
Lacy  Farguson 
Staphania  Fla Ids 
Santiago  Flarro 


Janica  Fishar 
Sub  Flalcaro 
Brant  Finning 
Pat  Ford 
Max  Foster 

lamatt  Baabal 

liana  Bast 
Nancy  fiasxarro 
Mika  Bayar 
Kavin  Bllnartin 
Marv  Bur 
Stava  Boffard 
Santos  fionxalaa 
Char lane  Bowar 
Francis  Brafton 
Francis  Bragg 
Sarnia  Brtan 
Miranda  Broan 
Kay  Brines 
Monica  Bustus 
filorla  Buth 

Joe  Hackney 
John  Hagans 
Cliff  Hahn 
Milton  Hakal 
Blann  Hal Ian 
larbara  Hanltlon 
Laurel  Hanllton 
David  Hannanan 
Larry  Nansar 
Mary  Ann  Hinton 
Chuck  Harnett 
Andrea  Harris 
Carolyn  Harris 
Delaine  Harris 
Jin  Harris 
Ingrid  Haintehn 
Susan  Hwn 
Irian  Hllburn 
Carolyn  Hill 
Susan  Hill 
Michael  Hlllalsohn 
Rota  HI las 
Candace  Heffnan 
Bona  Hoffnan 
Peggy  Hoffnan 
Anna  Holconb 
Sandy  Holland 
Kathy  Holloway 
Barkalay  Holnas 
Nllllan  Heines 
Susanna  Horvath 
Laaetta  Hugh 


Marlu  Hugh 
Marv  Hugh 
Janis  Hutton 
Nancy  Huffnan 
Harold  Hull 
Lloyd  Hunphrays 
Ton  Hydock 

Robert  Jagart 
Enna  Janas 
Dick  Janlasen 
Bragory  Jaffarson 
Lawranea  Johnson 
Pan  Johnson 
Sal 11a  Johnson 
Scott  Johnson 
Ednt  Johnston 
Agnat  Jonas 
Danisa  Jonas 
Ball  Jonas 
John  Joyner 

John  Kanp 
Kan  Kalsar 
Sue  Katkinan 
Hag  Kayos 
Mary  Kirknan 
Doirdra  Knapp 
Ton  Krackar 
Manuals  Kress 
Dob  Krug 
Dick  Krullk 
Ruthana  Krullk 
Doug  Kuhn 
Susan  Kuthnar 


Stava  Laanlain 
Chris  Larsan 
Alan  Lau 
Pat  Lawler 
Debra  Lewis 
Kathy  Lillie 
Cellun  Lincoln 
Tlnothy  Ling 
Robert  Linn 
Camen  London 
Jin  Lucas 

Wynn  MacDonald 
Rod  NcCloy 
lurthall  McCIIntock 
Tin  McCollun 
Ranu  McCord 
Deborah  McDaniels 
Dan  McGarvey 
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A)liion  Mc6r*dy 
Mtt  Mete* 

Lci1*y  NcHal* 
j«ff  MsManry 
Don  NcLiughlln 
•rian  McNeill 
Carol  Manning 
Dobbla  Marcua 
Clatian  Martin 
Dtrrick  Martin 
Fran  Martin 
Nary  Martin 
Scott  Martin 
tary  Maui 
Batty  Nay 
Mary  Htdvad 
Vincent  Mallllo 
Jana  Mall 
Bay  Mantel 
Ooiorai  Millar 
Yvonne  Millar 
Jaanatta  Ml  Ulnar 
tea  Milner 
Carol  Miner 
Karan  Mitchell 
Clluhath  Noara 
Joe  Moor* 

Kite  Morlay 
Stave  Hetewldio 
Kabacca  Hordinl 
Stave  Morrii 
Brag  Mother 
Oebbi*  Myart 

Sandy  Napier 
Britt  Nawtoa 
Dianna  Nllian 
John  Novak 

Bridgatta  O'Brien 
Leonardo  01 It 
Joe  OlMitaad 
Oarlona  Olion 
Scott  Opplar 
Bill  Otb^n 
Mllgualla  Otero 
Cynol  OMnioKurtx 
Nark  ONani-Kurtx 
Bobacca  Oxferd- 
Carpantar 

Bimca  Falaar 
Pat  Parhaa 
Navaa  Park 

Cheryl  Pauli In 
Bridget  Paoplat 
Anthony  Parai 


Nora  Ntanon 
Scott  Patorton 
llaa  Pineda 
Nvarly  Pepalka 
Rant  Pertar 
Ronald  Potilton 
Joann  Proctor 
Phyllla  Pryia 
Donovan  Puffar 
Elaine  Pulaket 

Paul  Radtka  • 

David  Ball 
Jodi  Raynoldi 
Barry  Blagalhaupt 
Robin  Blagalhaupt 
Diana  Kay  Ring 
Oava  Rivkin 
Lori  Robarion 
Michael  Robinton 
Don  Rogan 
Andy  Rote 
Sharon  Roia 
Harvey  Rotanbaun 
Richard  ftoianblatt 
Rod  Roue 
Paul  Roiiaalitl 
Michael  Ruatay 
Oar lane  Ruai-Eft 
Taraaa  Ruiiell 

Bob  Sadacca 
Leila  Sandart 
Martha  Sandart 
Suian  Sehachtaan 
Liddy  Schneider 
Shaiu  Schuitt 
Aurelia  Scott 
Cindy  Seale 
Jaannatta  Sakai lick 
Batia  Sharon 
Batty  Shelley 
Donna  Shephard 
Harr 1 I  Shettal 
Joyce  Shtu'.di 
Paula  Singleton 
Nancy  Skilling 
Dab  SkophaaBor 
Elliabath  Salth 
Helen  Sperling 
Bill  Spruill 
Jin  Stallingi 
Louis  Staavt 
Cath,  Stawarakl 
Brian  Stern 
Sutia  Starn 
Bayla  Stifle 


Ron  Still 
Arthur  Stono 
Malania  Stylet 
Rod  Sywai 
Phil  Sxenat 


Zarva  Taru 
Connie  Taylor 
Elaine  Taylor 
Mary  Anna  Taylor 
Mary  Tanopyr 
Barb  Thoaat 
Alice  Thcipton 
ToB  Tiffany 
Dor  It  Torrei 
Jody  ToguaB 
DIcK  Trotter 
Suzonna  Tiacouali 
Caraen  Tyaintkl 

Jay  Uhlanar 
Richard  Urich 


Theraia  Van  Hoy 
Robert  Vinebarg 


Andrew  Walton 
M1ng>Be1  Wang 
Ray  Halnberg 
Lindt  want 
Caorga  Wheaton 
OabbU  Whatzal 
Clint  Walker 
Leon  Watrogan 
Lan  White 
Todd  Win 
Anno  W1111a»% 
Walter  Wllllaas 
Effla  Wilton 

Tkiir*'loiM  y41afin 


Hilda  Wing 
Laurie  Wlta 
Ronlta  Wlinlowtkl 
Heine  Woodard 
Diatie  Woodnia 
Marcia  Wojtko 
Barky  Wyant 
Carol  '..yiaan 


Winnie  Young 


Laurie  Zaugg 
Ray  Zlnwrean 
Lola  Zook 
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